7 research outputs found
AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks
We present a new dataset of 3000 artificial music tracks with rich annotations based on real instrument samples and generated by algorithmic composition with respect to music theory. Our collection provides ground truth onset information and has several advantages compared to many available datasets. It can be used to compare and optimize algorithms for various music information retrieval tasks like music segmentation, instrument recognition, source separation, onset detection, key and chord recognition, or tempo estimation. As the audio is perfectly aligned to original MIDIs, all annotations (onsets, pitches, instruments, keys, tempos, chords, beats, and segment boundaries) are absolutely precise. Because of that, specific scenarios can be addressed, for instance, detection of segment boundaries with instrument and key change only, or onset detection only in tracks with drums and slow tempo. This allows for the exhaustive evaluation and identification of individual weak points of algorithms. In contrast to datasets with commercial music, all audio tracks are freely available, allowing for extraction of own audio features. All music pieces are stored as single instrument audio tracks and a mix track, so that different augmentations and DSP effects can be applied to extend training sets and create individual mixes, e.g., for deep neural networks. In three case studies, we show how different algorithms and neural network models can be analyzed and compared for music segmentation, instrument recognition, and onset detection. In future, the dataset can be easily extended under consideration of specific demands to the composition process
Applications of Cross-Adaptive Audio Effects: Automatic Mixing, Live Performance and Everything in Between
This paper provides a systematic review of cross-adaptive audio effects and their applications. These effects extend the boundaries of traditional audio effects by potentially having many inputs and outputs, and deriving their behavior based on analysis of the signals. This mode of control allows the effects to adapt to different material, seemingly “being aware” of what they do to signals. By extension, cross-adaptive processes are designed to take into account features of, and relations between, several simultaneous signals. Thus a more global awareness and responsivity can be achieved in the processing system. When such a system is used in real-time for music performance, we observe cross-adaptive performative effects. When a musician uses the signals of other performers directly to inform the timbral character of her own instrument, it enables a radical expansion of the human-to-human interaction during music making. In order to give the signal interactions a sturdy frame of reference, we engage in a brief history of applications as well as a classification of effects types and clarifications in relation to earlier literature. With this background, the current paper defines the field, lays a formal framework, explores technical aspects and applications, and considers the future of this growing field
The sound engineer’s creativity : mediative practices and the recorded artifact
A portfolio of eight publications and one archival project are submitted for this
PhD by Portfolio. This portfolio covers aspects of record production studies,
record production education, sound studies, and audiovisual preservation and
the interconnection between sound engineering practice, archival curation, and
research.
This thesis ties together this portfolio to frame the sound engineer as an integral
part in the creative network of record production. More specifically, the sound
engineer is a creative leader in this network who, through mediative practices,
authors sonic aesthetics, which are exemplified through archival materials.
This thesis examines literature within the aforementioned fields and positions
this thesis at their nexus. The literature review section displays the gaps and
connections that this thesis exploits to provide new interdisciplinary knowledge.
The methodology for this thesis is reflexive Actor Network Theory (rANT), where
I examine the interactive connections between the sound engineer and the
recorded artifact. The methodology section also explores the various
frameworks used within the portfolio and shows how rANT works to unify this
submission.
This thesis explores the knowledge transfer of sound engineers, the recorded
artifact as evidence of engineering practice, the mediative practices that are
creatively employed by sound engineers, and the creative leadership position of
sound engineers as sonic aesthetic authors. These sections will answer the
primary research question; how does the sound engineer employ creative
leadership in the recording studio
Computational Modeling and Analysis of Multi-timbral Musical Instrument Mixtures
In the audio domain, the disciplines of signal processing, machine learning, psychoacoustics, information theory and library science have merged into the field of Music Information Retrieval (Music-IR). Music-IR researchers attempt to extract high level information from music like pitch, meter, genre, rhythm and timbre directly from audio signals as well as semantic meta-data over a wide variety of sources. This information is then used to organize and process data for large scale retrieval and novel interfaces. For creating musical content, access to hardware and software tools for producing music has become commonplace in the digital landscape. While the means to produce music have become widely available, significant time must be invested to attain professional results. Mixing multi-channel audio requires techniques and training far beyond the knowledge of the average music software user. As a result, there is significant growth and development in intelligent signal processing for audio, an emergent field combining audio signal processing and machine learning for producing music. This work focuses on methods for modeling and analyzing multi-timbral musical instrument mixtures and performing automated processing techniques to improve audio quality based on quantitative and qualitative measures. The main contributions of the work involve training models to predict mixing parameters for multi-channel audio sources and developing new methods to model the component interactions of individual timbres to an overall mixture. Linear dynamical systems (LDS) are shown to be capable of learning the relative contributions of individual instruments to re-create a commercial recording based on acoustic features extracted directly from audio. Variations in the model topology are explored to make it applicable to a more diverse range of input sources and improve performance. An exploration of relevant features for modeling timbre and identifying instruments is performed. Using various basis decomposition techniques, audio examples are reconstructed and analyzed in a perceptual listening test to evaluate their ability to capture salient aspects of timbre. These tests show that a 2-D decomposition is able to capture much more perceptually relevant information with regard to the temporal evolution of the frequency spectrum of a set of audio examples. The results indicate that joint modeling of frequencies and their evolution is essential for capturing higher level concepts in audio that we desire to leverage in automated systems.Ph.D., Electrical Engineering -- Drexel University, 201
Music Metadata Capture in the Studio from Audio and Symbolic Data
PhdMusic Information Retrieval (MIR) tasks, in the main, are concerned with
the accurate generation of one of a number of different types of music metadata
{beat onsets, or melody extraction, for example. Almost always,
they operate on fully mixed digital audio recordings. Commonly, this
means that a large amount of signal processing effort is directed towards
the isolation, and then identification, of certain highly relevant aspects of
the audio mix. In some cases, results of one MIR algorithm are useful, if
not essential, to the operation of another { a chord detection algorithm
for example, is highly dependent upon accurate pitch detection. Although
not clearly defined in all cases, certain rules exist which we may take from
music theory in order to assist the task { the particular note intervals
which make up a specific chord, for example.
On the question of generating accurate, low level music metadata (e.g.
chromatic pitch and score onset time), a potentially huge advantage lies
in the use of multitrack, rather than mixed, audio recordings, in which
the separate instrument recordings may be analysed in isolation.
Additionally, in MIR, as in many other research areas currently, there
is an increasing push towards the use of the Semantic Web for publishing
metadata using the Resource Description Framework (RDF). Semantic
Web technologies, though, also facilitate the querying of data via the
SPARQL query language, as well as logical inferencing via the careful
creation and use of web ontology language (OWL) ontologies. This, in
turn, opens up the intriguing possibility of deferring our decision regarding
which particular type of MIR query to ask of our low-level music
metadata until some point later down the line, long after all the heavy
signal processing has been carried out.
In this thesis, we describe an over-arching vision for an alternative MIR paradigm, built around the principles of early, studio-based metadata
capture, and exploitation of open, machine-readable Semantic Web
data. Using the specific example of structural segmentation, we demonstrate
that by analysing multitrack rather than mixed audio, we are able
to achieve a significant and quantifiable increase in the accuracy of our
segmentation algorithm. We also provide details of a new multitrack audio
dataset with structural segmentation annotations, created as part of
this research, and available for public use.
Furthermore, we show that it is possible to fully implement a pair of
pattern discovery algorithms (the SIA and SIATEC algorithms { highly
applicable, but not restricted to, symbolic music data analysis) using only
SemanticWeb technologies { the SPARQL query language, acting on RDF
data, in tandem with a small OWL ontology. We describe the challenges
encountered by taking this approach, the particular solution we've arrived
at, and we evaluate the implementation both in terms of its execution time,
and also within the wider context of our vision for a new MIR paradigm.EPSRC studentship no. EP/505054/1
Semantic Audio Analysis Utilities and Applications.
PhDExtraction, representation, organisation and application of metadata about audio recordings
are in the concern of semantic audio analysis. Our broad interpretation, aligned with recent
developments in the field, includes methodological aspects of semantic audio, such as
those related to information management, knowledge representation and applications of the
extracted information. In particular, we look at how Semantic Web technologies may be used
to enhance information management practices in two audio related areas: music informatics
and music production.
In the first area, we are concerned with music information retrieval (MIR) and related
research. We examine how structured data may be used to support reproducibility and
provenance of extracted information, and aim to support multi-modality and context adaptation
in the analysis. In creative music production, our goals can be summarised as follows:
O↵-the-shelf sound editors do not hold appropriately structured information about the edited
material, thus human-computer interaction is inefficient. We believe that recent developments
in sound analysis and music understanding are capable of bringing about significant improvements
in the music production workflow. Providing visual cues related to music structure can
serve as an example of intelligent, context-dependent functionality.
The central contributions of this work are a Semantic Web ontology for describing recording
studios, including a model of technological artefacts used in music production, methodologies
for collecting data about music production workflows and describing the work of
audio engineers which facilitates capturing their contribution to music production, and finally
a framework for creating Web-based applications for automated audio analysis. This
has applications demonstrating how Semantic Web technologies and ontologies can facilitate
interoperability between music research tools, and the creation of semantic audio software, for
instance, for music recommendation, temperament estimation or multi-modal music tutorin