144 research outputs found

    Computational Tonality Estimation: Signal Processing and Hidden Markov Models

    Get PDF
    PhDThis thesis investigates computational musical tonality estimation from an audio signal. We present a hidden Markov model (HMM) in which relationships between chords and keys are expressed as probabilities of emitting observable chords from a hidden key sequence. The model is tested first using symbolic chord annotations as observations, and gives excellent global key recognition rates on a set of Beatles songs. The initial model is extended for audio input by using an existing chord recognition algorithm, which allows it to be tested on a much larger database. We show that a simple model of the upper partials in the signal improves percentage scores. We also present a variant of the HMM which has a continuous observation probability density, but show that the discrete version gives better performance. Then follows a detailed analysis of the effects on key estimation and computation time of changing the low level signal processing parameters. We find that much of the high frequency information can be omitted without loss of accuracy, and significant computational savings can be made by applying a threshold to the transform kernels. Results show that there is no single ideal set of parameters for all music, but that tuning the parameters can make a difference to accuracy. We discuss methods of evaluating more complex tonal changes than a single global key, and compare a metric that measures similarity to a ground truth to metrics that are rooted in music retrieval. We show that the two measures give different results, and so recommend that the choice of evaluation metric is determined by the intended application. Finally we draw together our conclusions and use them to suggest areas for continuation of this research, in the areas of tonality model development, feature extraction, evaluation methodology, and applications of computational tonality estimation.Engineering and Physical Sciences Research Council (EPSRC)

    FROM MUSIC INFORMATION RETRIEVAL (MIR) TO INFORMATION RETRIEVAL FOR MUSIC (IRM)

    Get PDF
    This thesis reviews and discusses certain techniques from the domain of (Music) Information Retrieval, in particular some general data mining algorithms. It also describes their specific adaptations for use as building blocks in the CACE4 software application. The use of Augmented Transition Networks (ATN) from the field of (Music) Information Retrieval is, to a certain extent, adequate as long as one keeps the underlying tonal constraints and rules as a guide to understanding the structure one is looking for. However since a large proportion of algorithmic music, including music composed by the author, is atonal, tonal constraints and rules are of little use. Analysis methods from Hierarchical Clustering Techniques (HCT) such as k-means and Expectation-Maximisation (EM) facilitate other approaches and are better suited for finding (clustered) structures in large data sets. ART2 Neural Networks (Adaptive Resonance Theory) for example, can be used for analysing and categorising these data sets. Statistical tools such as histogram analysis, mean, variance as well as correlation calculations can provide information about connections between members in a data set. Altogether this provides a diverse palette of usable data analysis methods and strategies for creating algorithmic atonal music. Now acting as (software) strategy tools, their use is determined by the quality of their output within a musical context, as demonstrated when developed and programmed into the Computer Assisted Composition Environment: CACE4. Music Information Retrieval techniques are therefore inverted: their specific techniques and associated methods of Information Retrieval and general data mining are used to access the organisation and constraints of abstract (non-specific musical) data in order to use and transform it in a musical composition

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Extraction and representation of semantic information in digital media

    Get PDF

    A model of sonority based on pitch intelligibility

    Get PDF
    Synopsis: Sonority is a central notion in phonetics and phonology and it is essential for generalizations related to syllabic organization. However, to date there is no clear consensus on the phonetic basis of sonority, neither in perception nor in production. The widely used Sonority Sequencing Principle (SSP) represents the speech signal as a sequence of discrete units, where phonological processes are modeled as symbol manipulating rules that lack a temporal dimension and are devoid of inherent links to perceptual, motoric or cognitive processes. The current work aims to change this by outlining a novel approach for the extraction of continuous entities from acoustic space in order to model dynamic aspects of phonological perception. It is used here to advance a functional understanding of sonority as a universal aspect of prosody that requires pitch-bearing syllables as the building blocks of speech. This book argues that sonority is best understood as a measurement of pitch intelligibility in perception, which is closely linked to periodic energy in acoustics. It presents a novel principle for sonority-based determinations of well-formedness – the Nucleus Attraction Principle (NAP). Two complementary NAP models independently account for symbolic and continuous representations and they mostly outperform SSP-based models, demonstrated here with experimental perception studies and with a corpus study of Modern Hebrew nouns. This work also includes a description of ProPer (Prosodic Analysis with Periodic Energy). The ProPer toolbox further exploits the proposal that periodic energy reflects sonority in order to cover major topics in prosodic research, such as prominence, intonation and speech rate. The book is finally concluded with brief discussions on selected topics: (i) the phonotactic division of labor with respect to /s/-stop clusters; (ii) the debate about the universality of sonority; and (iii) the fate of the classic phonetics–phonology dichotomy as it relates to continuity and dynamics in phonology

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Exploiting prior knowledge during automatic key and chord estimation from musical audio

    Get PDF
    Chords and keys are two ways of describing music. They are exemplary of a general class of symbolic notations that musicians use to exchange information about a music piece. This information can range from simple tempo indications such as “allegro” to precise instructions for a performer of the music. Concretely, both keys and chords are timed labels that describe the harmony during certain time intervals, where harmony refers to the way music notes sound together. Chords describe the local harmony, whereas keys offer a more global overview and consequently cover a sequence of multiple chords. Common to all music notations is that certain characteristics of the music are described while others are ignored. The adopted level of detail depends on the purpose of the intended information exchange. A simple description such as “menuet”, for example, only serves to roughly describe the character of a music piece. Sheet music on the other hand contains precise information about the pitch, discretised information pertaining to timing and limited information about the timbre. Its goal is to permit a performer to recreate the music piece. Even so, the information about timing and timbre still leaves some space for interpretation by the performer. The opposite of a symbolic notation is a music recording. It stores the music in a way that allows for a perfect reproduction. The disadvantage of a music recording is that it does not allow to manipulate a single aspect of a music piece in isolation, or at least not without degrading the quality of the reproduction. For instance, it is not possible to change the instrumentation in a music recording, even though this would only require the simple change of a few symbols in a symbolic notation. Despite the fundamental differences between a music recording and a symbolic notation, the two are of course intertwined. Trained musicians can listen to a music recording (or live music) and write down a symbolic notation of the played piece. This skill allows one, in theory, to create a symbolic notation for each recording in a music collection. In practice however, this would be too labour intensive for the large collections that are available these days through online stores or streaming services. Automating the notation process is therefore a necessity, and this is exactly the subject of this thesis. More specifically, this thesis deals with the extraction of keys and chords from a music recording. A database with keys and chords opens up applications that are not possible with a database of music recordings alone. On one hand, chords can be used on their own as a compact representation of a music piece, for example to learn how to play an accompaniment for singing. On the other hand, keys and chords can also be used indirectly to accomplish another goal, such as finding similar pieces. Because music theory has been studied for centuries, a great body of knowledge about keys and chords is available. It is known that consecutive keys and chords form sequences that are all but random. People happen to have certain expectations that must be fulfilled in order to experience music as pleasant. Keys and chords are also strongly intertwined, as a given key implies that certain chords will likely occur and a set of given chords implies an encompassing key in return. Consequently, a substantial part of this thesis is concerned with the question whether musicological knowledge can be embedded in a technical framework in such a way that it helps to improve the automatic recognition of keys and chords. The technical framework adopted in this thesis is built around a hidden Markov model (HMM). This facilitates an easy separation of the different aspects involved in the automatic recognition of keys and chords. Most experiments reviewed in the thesis focus on taking into account musicological knowledge about the musical context and about the expected chord duration. Technically speaking, this involves a manipulation of the transition probabilities in the HMMs. To account for the interaction between keys and chords, every HMM state is actually representing the combination of a key and a chord label. In the first part of the thesis, a number of alternatives for modelling the context are proposed. In particular, separate key change and chord change models are defined such that they closely mirror the way musicians conceive harmony. Multiple variants are considered that differ in the size of the context that is accounted for and in the knowledge source from which they were compiled. Some models are derived from a music corpus with key and chord notations whereas others follow directly from music theory. In the second part of the thesis, the contextual models are embedded in a system for automatic key and chord estimation. The features used in that system are so-called chroma profiles, which represent the saliences of the pitch classes in the audio signal. These chroma profiles are acoustically modelled by means of templates (idealised profiles) and a distance measure. In addition to these acoustic models and the contextual models developed in the first part, durational models are also required. The latter ensure that the chord and key estimations attain specified mean durations. The resulting system is then used to conduct experiments that provide more insight into how each system component contributes to the ultimate key and chord output quality. During the experimental study, the system complexity gets gradually increased, starting from a system containing only an acoustic model of the features that gets subsequently extended, first with duration models and afterwards with contextual models. The experiments show that taking into account the mean key and mean chord duration is essential to arrive at acceptable results for both key and chord estimation. The effect of using contextual information, however, is highly variable. On one hand, the chord change model has only a limited positive impact on the chord estimation accuracy (two to three percentage points), but this impact is fairly stable across different model variants. On the other hand, the chord change model has a much larger potential to improve the key output quality (up to seventeen percentage points), but only on the condition that the variant of the model is well adapted to the tested music material. Lastly, the key change model has only a negligible influence on the system performance. In the final part of this thesis, a couple of extensions to the formerly presented system are proposed and assessed. First, the global mean chord duration is replaced by key-chord specific values, which has a positive effect on the key estimation performance. Next, the HMM system is modified such that the prior chord duration distribution is no longer a geometric distribution but one that better approximates the observed durations in an appropriate data set. This modification leads to a small improvement of the chord estimation performance, but of course, it requires the availability of a suitable data set with chord notations from which to retrieve a target durational distribution. A final experiment demonstrates that increasing the scope of the contextual model only leads to statistically insignificant improvements. On top of that, the required computational load increases greatly

    Compositions Utilizing Fractal Flame Algorithms

    Get PDF
    “Music, by its very abstract nature, is the first of the arts to have attempted reconciliation of artistic creation with scientific thought” – Xenakis, 1992 This portfolio explores how the iterative and recursive processes employed within fractal flame algorithms can be used to create new and aesthetically pleasing micro and macro sounds from which coherent compositions can be created. A variety of existing electronic compositional procedures, including wave-set substitution and granular synthesis, as well as a number of classical compositional practices, such as hocketing, are deployed to generate a complex and diverse set of compositions. The portfolio shows how marrying these sound manipulating techniques and compositional processes with the sonic events produced by the unexplored field of fractal flame algorithms has allowed me to generate – in the words of Iannis Xenakis – ‘sounds that have never existed before’. The portfolio shows the creative potential fractal flame programs have for electronic music generation and how they offer a terra nova (new earth) upon which computergenerated music can lay down solid foundations and expand in new directions to harvest exciting results

    Third International Conference on Technologies for Music Notation and Representation TENOR 2017

    Get PDF
    The third International Conference on Technologies for Music Notation and Representation seeks to focus on a set of specific research issues associated with Music Notation that were elaborated at the first two editions of TENOR in Paris and Cambridge. The theme of the conference is vocal music, whereas the pre-conference workshops focus on innovative technological approaches to music notation

    From a musical protolanguage to rhythm and tonality

    Get PDF
    Treballs Finals del Màster en Ciència Cognitiva i Llenguatge, Facultat de Filosofia, Universitat de Barcelona, Curs: 2014-2015, Tutora: Joana Rosselló Ximenes[eng] Music and language are two faculties that have only evolved in humans, and by mutual interaction. As Darwin (1871) suggested, before speaking, our ancestors were able to sing in a way structurally and functionally similar to what birds do. At that stage, a musical protolanguage with beat yielded a common basis for music and language. Hierarchical recursion along with grammar and lexical meaning joined this musical protolanguage and gave rise to language. Linguistic recursion, in turn, made meter possible. Rhythm therefore would have preceded tonality. Subsequently, in parallel to the emergence of grammar, harmony and tonality were added to the meter. That beat is more primitive than meter is suggested by the fact that some animals perceive but do not externalize it. Crucially, they are all vocal learners. Externalization, either in musical rhythm or language, requires a complex social behaviour, which for rhythm is already present in the drumming behaviour of certain primates. The role of vocalizations, in turn, goes even further: their harmonic spectrum underpinned the tones of our musical scales. Thus, driven to a large extent by language, music has turned out to be as we know it nowadays.[cat] La música i el llenguatge són dues facultats exclusivament humanes que han evolucionat alimentantse mútuament. Com Darwin (1871) ja va suggerir, abans de parlar, els nostres ancestres tenien cants similars funcionalment i estructuralment al cant dels ocells. En aquest estadi, un protollenguatge musical amb pulsació es consolidà com a base comuna de la música i el llenguatge. La recursió jeràrquica, juntament amb la gramàtica i el significat lèxic, es van afegir a aquest protollenguatge musical i van donar lloc al llenguatge. Aquesta recursió lingüística féu possible el metre. El ritme, doncs, va precedir la tonalitat. Ulteriorment, en paral·lel al sorgiment de la gramàtica, l’harmonia i la tonalitat s’afegeixen al metre (compàs). Que la pulsació és més primitiva ho indica el fet que certs animals la perceben però no l’externalitzen espontàniament. Crucialment, tots són vocal learners. L’externalització, tant del ritme com del llenguatge, requereix una conducta social complexa, que ja s’observa en el conducta percutiva (drumming) de certs primats. El paper de les vocalitzacions, per la seva banda, va encara més enllà: l’espectre harmònic que presenten és el fonament de les notes a les escales musical. Així doncs, a remolc del llenguatge, és com s’arriba a la música tal i com l’entenem avui en dia.[spa] La música y el lenguaje son dos capacidades exclusi vamente humanas que han evolucionado alimentándose mutuamente. Como Darwin (1871) ya sug irió, antes de hablar, nuestros ancestros, tenían cantos similares funcionalmente y estructura lmente al canto de los pájaros. En este estadio, un protolenguaje musical con pulsación se consolidó com o la base común de la música y el lenguaje. La recursión jerárquica, junto con la gramática y el sig nificado léxico, se añadieron a este protolenguaje musical y dieron lugar al lenguaje. Esta recursión l ingüística hace posible el metro. El ritmo, pues, precedió la tonalidad. Ulteriormente, en paralelo a l surgimiento de la gramática, la armonía y la tonalidad se añaden al metro (compás). Que la pulsa ción es más primitiva lo indica el hecho de que ciertos animales la perciben pero no la externaliza n espontáneamente. Crucialmente, todos son vocal learners . La externalización, tanto del ritmo como del leng uaje, requiere una conducta social compleja, que ya se observa en la conducta percutiva ( drumming ) de ciertos primates. El papel de las vocalizaciones, por su parte, va aún más allá: el e spectro armónico que presentan es la base de las notas en las escaleras musicales. Así, a remolque d el lenguaje, es como se llega a la música tal y como la entendemos hoy en dí
    corecore