558 research outputs found
Harmonic Change Detection from Musical Audio
In this dissertation, we advance an enhanced method for computing Harte et al.’s [31] Harmonic Change Detection Function (HCDF). HCDF aims to detect harmonic transitions in musical audio signals. HCDF is crucial both for the chord recognition in Music Information Retrieval (MIR) and a wide range of creative applications. In light of recent advances in harmonic description and transformation, we depart from the original architecture of Harte et al.’s HCDF, to revisit each one of its component blocks, which are evaluated using an exhaustive grid search aimed to identify optimal parameters across four large style-specific musical datasets. Our results show that the newly proposed methods and parameter optimization improve the detection of harmonic changes, by 5.57% (f-score) with respect to previous methods. Furthermore, while guaranteeing recall values at > 99%, our method improves precision by 6.28%. Aiming to leverage novel strategies for real-time harmonic-content audio processing, the optimized HCDF is made available for Javascript and the MAX and Pure Data multimedia programming environments. Moreover, all the data as well as the Python code used to generate them, are made available.<br /
Automatic characterization and generation of music loops and instrument samples for electronic music production
Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process.
We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation.Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process.
We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation
A Memetic Analysis of a Phrase by Beethoven: Calvinian Perspectives on Similarity and Lexicon-Abstraction
This article discusses some general issues arising from the study of similarity in music, both human-conducted and computer-aided, and then progresses to a consideration of similarity relationships between patterns in a phrase by Beethoven, from the first movement of the Piano Sonata in A flat major op. 110 (1821), and various potential memetic precursors. This analysis is followed by a consideration of how the kinds of similarity identified in the Beethoven phrase might be understood in psychological/conceptual and then neurobiological terms, the latter by means of William Calvin’s Hexagonal Cloning Theory. This theory offers a mechanism for the operation of David Cope’s concept of the lexicon, conceived here as a museme allele-class. I conclude by attempting to correlate and map the various spaces within which memetic replication occurs
Modelling the perception and composition of Western musical harmony.
PhD ThesisHarmony is a fundamental structuring principle in Western music, determining
how simultaneously occurring musical notes combine to form chords, and how
successions of chords combine to form chord progressions. Harmony is interesting
to psychologists because it unites many core features of auditory perception
and cognition, such as pitch perception, auditory scene analysis, and statistical
learning. A current challenge is to formalise our psychological understanding
of harmony through computational modelling. Here we detail computational
studies of three core dimensions of harmony: consonance, harmonic expectation,
and voice leading. These studies develop and evaluate computational models
of the psychoacoustic and cognitive processes involved in harmony perception,
and quantitatively model how these processes contribute to music composition.
Through these studies we examine long-standing issues in music psychology,
such as the relative contributions of roughness and harmonicity to consonance
perception, the roles of low-level psychoacoustic and high-level cognitive processes
in harmony perception, and the probabilistic nature of harmonic expectation.
We also develop cognitively informed computational models that are
capable of both analysing existing music and generating new music, with potential
applications in computational creativity, music informatics, and music
psychology. This thesis is accompanied by a collection of open-source software
packages that implement the models developed and evaluated here, which we
hope will support future research into the psychological foundations of musical
harmony.
Recommended from our members
Artificial intelligence, education and music : the use of artificial intelligence to encourage and facilitate music composition by novices
The goal of the research described in this thesis is to find ways of using artificial intelligence to encourage and facilitate music composition by musical novices, particularly those without traditional musical skills. Two complementary approaches are presented.
We show how two recent cognitive theories of harmony can be used to design a new kind of direct manipulation tool for music, known as "Harmony Space", with the expressivity to allow novices to sketch, analyse, modify and compose harmonic sequences simply and clearly by moving two-dimensional patterns on a computer screen linker to a synthesizer. Harmony Space provides novices with a way of describing and controlling harmonic structures and relationships using a single, principled, uniform spatial metaphor at various musical levels; note level, interval level, chord level, harmonic succession level and key level. A prototype interface has been implemented to demonstrate the coherence and feasibility of the design. An investigation with a small number of subjects demonstrates that Harmony Space considerably reduces the prerequisites required for novices to learn about, sketch, analyse and experiment with harmony - activities that would normally be very difficult for them without considerable theoretical knowledge or instrumental skill.
The second part of the thesis presents work towards a knowledge-based tutoring system to help novices using the interface to compose chord sequences. It is argued that traditional, remedial intelligent tutoring systems approaches are inadequate for tutoring in domains that require open-ended thinking. The foundation of a new approach is developed based on the exploration and transformation of case studies described in terms of chunks, styles and plans. This approach draws on a characterisation of creativity due to Johnson-Laird (1988). Programs have been implemented to illustrate the feasibility of key parts of the new approach
A Derivation of the Tonal Hierarchy from Basic Perceptual Processes
In recent decades music psychologists have explained the functioning of tonal music in terms of the tonal hierarchy, a stable schema of relative structural importance that helps us interpret the events in a passage of tonal music. This idea has been most influentially disseminated by Carol Krumhansl in her 1990 monograph Cognitive Foundations of Musical Pitch. Krumhansl hypothesized that this sense of the importance or centrality of certain tones of a key is learned through exposure to tonal music, in particular by learning the relative frequency of appearance of the various pitch classes in tonal passages. The correlation of pitch-class quantity and structural status has been the subject of a number of successful studies, leading to the general acceptance of the pitch-distributional account of tonal hierarchy in the field of music psychology.
This study argues that the correlation of pitch-class quantity with structural status is a byproduct of other, more fundamental perceptual properties, all of which are derived from aspects of everyday listening. Individual chapters consider the phenomena of consonance and dissonance, intervallic rootedness, the short-term memory for pitch collection, and the interaction of temporal ordering and voice-leading that Jamshed Bharucha calls melodic anchoring. The study concludes with an elaborate self-experiment that observes the interaction of these properties in a pool of 275 stimuli, each of which is constructed from a single dyad plus one subsequent tone
Exploiting prior knowledge during automatic key and chord estimation from musical audio
Chords and keys are two ways of describing music. They are exemplary of a general class of symbolic notations that musicians use to exchange information about a music piece. This information can range from simple tempo indications such as “allegro” to precise instructions for a performer of the music. Concretely, both keys and chords are timed labels that describe the harmony during certain time intervals, where harmony refers to the way music notes sound together. Chords describe the local harmony, whereas keys offer a more global overview and consequently cover a sequence of multiple chords.
Common to all music notations is that certain characteristics of the music are described while others are ignored. The adopted level of detail depends on the purpose of the intended information exchange. A simple description such as “menuet”, for example, only serves to roughly describe the character of a music piece. Sheet music on the other hand contains precise information about the pitch, discretised information pertaining to timing and limited information about the timbre. Its goal is to permit a performer to recreate the music piece. Even so, the information about timing and timbre still leaves some space for interpretation by the performer.
The opposite of a symbolic notation is a music recording. It stores the music in a way that allows for a perfect reproduction. The disadvantage of a music recording is that it does not allow to manipulate a single aspect of a music piece in isolation, or at least not without degrading the quality of the reproduction. For instance, it is not possible to change the instrumentation in a music recording, even though this would only require the simple change of a few symbols in a symbolic notation.
Despite the fundamental differences between a music recording and a symbolic notation, the two are of course intertwined. Trained musicians can listen to a music recording (or live music) and write down a symbolic notation of the played piece. This skill allows one, in theory, to create a symbolic notation for each recording in a music collection. In practice however, this would be too labour intensive for the large collections that are available these days through online stores or streaming services. Automating the notation process is therefore a necessity, and this is exactly the subject of this thesis. More specifically, this thesis deals with the extraction of keys and chords from a music recording. A database with keys and chords opens up applications that are not possible with a database of music recordings alone. On one hand, chords can be used on their own as a compact representation of a music piece, for example to learn how to play an accompaniment for singing. On the other hand, keys and chords can also be used indirectly to accomplish another goal, such as finding similar pieces.
Because music theory has been studied for centuries, a great body of knowledge about keys and chords is available. It is known that consecutive keys and chords form sequences that are all but random. People happen to have certain expectations that must be fulfilled in order to experience music as pleasant. Keys and chords are also strongly intertwined, as a given key implies that certain chords will likely occur and a set of given chords implies an encompassing key in return. Consequently, a substantial part of this thesis is concerned with the question whether musicological knowledge can be embedded in a technical framework in such a way that it helps to improve the automatic recognition of keys and chords.
The technical framework adopted in this thesis is built around a hidden Markov model (HMM). This facilitates an easy separation of the different aspects involved in the automatic recognition of keys and chords. Most experiments reviewed in the thesis focus on taking into account musicological knowledge about the musical context and about the expected chord duration. Technically speaking, this involves a manipulation of the transition probabilities in the HMMs. To account for the interaction between keys and chords, every HMM state is actually representing the combination of a key and a chord label.
In the first part of the thesis, a number of alternatives for modelling the context are proposed. In particular, separate key change and chord change models are defined such that they closely mirror the way musicians conceive harmony. Multiple variants are considered that differ in the size of the context that is accounted for and in the knowledge source from which they were compiled. Some models are derived from a music corpus with key and chord notations whereas others follow directly from music theory.
In the second part of the thesis, the contextual models are embedded in a system for automatic key and chord estimation. The features used in that system are so-called chroma profiles, which represent the saliences of the pitch classes in the audio signal. These chroma profiles are acoustically modelled by means of templates (idealised profiles) and a distance measure. In addition to these acoustic models and the contextual models developed in the first part, durational models are also required. The latter ensure that the chord and key estimations attain specified mean durations.
The resulting system is then used to conduct experiments that provide more insight into how each system component contributes to the ultimate key and chord output quality. During the experimental study, the system complexity gets gradually increased, starting from a system containing only an acoustic model of the features that gets subsequently extended, first with duration models and afterwards with contextual models. The experiments show that taking into account the mean key and mean chord duration is essential to arrive at acceptable results for both key and chord estimation. The effect of using contextual information, however, is highly variable. On one hand, the chord change model has only a limited positive impact on the chord estimation accuracy (two to three percentage points), but this impact is fairly stable across different model variants. On the other hand, the chord change model has a much larger potential to improve the key output quality (up to seventeen percentage points), but only on the condition that the variant of the model is well adapted to the tested music material. Lastly, the key change model has only a negligible influence on the system performance.
In the final part of this thesis, a couple of extensions to the formerly presented system are proposed and assessed. First, the global mean chord duration is replaced by key-chord specific values, which has a positive effect on the key estimation performance. Next, the HMM system is modified such that the prior chord duration distribution is no longer a geometric distribution but one that better approximates the observed durations in an appropriate data set. This modification leads to a small improvement of the chord estimation performance, but of course, it requires the availability of a suitable data set with chord notations from which to retrieve a target durational distribution. A final experiment demonstrates that increasing the scope of the contextual model only leads to statistically insignificant improvements. On top of that, the required computational load increases greatly
- …