363 research outputs found

    Automatic chord transcription from audio using computational models of musical context

    Get PDF
    PhDThis thesis is concerned with the automatic transcription of chords from audio, with an emphasis on modern popular music. Musical context such as the key and the structural segmentation aid the interpretation of chords in human beings. In this thesis we propose computational models that integrate such musical context into the automatic chord estimation process. We present a novel dynamic Bayesian network (DBN) which integrates models of metric position, key, chord, bass note and two beat-synchronous audio features (bass and treble chroma) into a single high-level musical context model. We simultaneously infer the most probable sequence of metric positions, keys, chords and bass notes via Viterbi inference. Several experiments with real world data show that adding context parameters results in a significant increase in chord recognition accuracy and faithfulness of chord segmentation. The proposed, most complex method transcribes chords with a state-of-the-art accuracy of 73% on the song collection used for the 2009 MIREX Chord Detection tasks. This method is used as a baseline method for two further enhancements. Firstly, we aim to improve chord confusion behaviour by modifying the audio front end processing. We compare the effect of learning chord profiles as Gaussian mixtures to the effect of using chromagrams generated from an approximate pitch transcription method. We show that using chromagrams from approximate transcription results in the most substantial increase in accuracy. The best method achieves 79% accuracy and significantly outperforms the state of the art. Secondly, we propose a method by which chromagram information is shared between repeated structural segments (such as verses) in a song. This can be done fully automatically using a novel structural segmentation algorithm tailored to this task. We show that the technique leads to a significant increase in accuracy and readability. The segmentation algorithm itself also obtains state-of-the-art results. A method that combines both of the above enhancements reaches an accuracy of 81%, a statistically significant improvement over the best result (74%) in the 2009 MIREX Chord Detection tasks.Engineering and Physical Research Council U

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Exploiting prior knowledge during automatic key and chord estimation from musical audio

    Get PDF
    Chords and keys are two ways of describing music. They are exemplary of a general class of symbolic notations that musicians use to exchange information about a music piece. This information can range from simple tempo indications such as “allegro” to precise instructions for a performer of the music. Concretely, both keys and chords are timed labels that describe the harmony during certain time intervals, where harmony refers to the way music notes sound together. Chords describe the local harmony, whereas keys offer a more global overview and consequently cover a sequence of multiple chords. Common to all music notations is that certain characteristics of the music are described while others are ignored. The adopted level of detail depends on the purpose of the intended information exchange. A simple description such as “menuet”, for example, only serves to roughly describe the character of a music piece. Sheet music on the other hand contains precise information about the pitch, discretised information pertaining to timing and limited information about the timbre. Its goal is to permit a performer to recreate the music piece. Even so, the information about timing and timbre still leaves some space for interpretation by the performer. The opposite of a symbolic notation is a music recording. It stores the music in a way that allows for a perfect reproduction. The disadvantage of a music recording is that it does not allow to manipulate a single aspect of a music piece in isolation, or at least not without degrading the quality of the reproduction. For instance, it is not possible to change the instrumentation in a music recording, even though this would only require the simple change of a few symbols in a symbolic notation. Despite the fundamental differences between a music recording and a symbolic notation, the two are of course intertwined. Trained musicians can listen to a music recording (or live music) and write down a symbolic notation of the played piece. This skill allows one, in theory, to create a symbolic notation for each recording in a music collection. In practice however, this would be too labour intensive for the large collections that are available these days through online stores or streaming services. Automating the notation process is therefore a necessity, and this is exactly the subject of this thesis. More specifically, this thesis deals with the extraction of keys and chords from a music recording. A database with keys and chords opens up applications that are not possible with a database of music recordings alone. On one hand, chords can be used on their own as a compact representation of a music piece, for example to learn how to play an accompaniment for singing. On the other hand, keys and chords can also be used indirectly to accomplish another goal, such as finding similar pieces. Because music theory has been studied for centuries, a great body of knowledge about keys and chords is available. It is known that consecutive keys and chords form sequences that are all but random. People happen to have certain expectations that must be fulfilled in order to experience music as pleasant. Keys and chords are also strongly intertwined, as a given key implies that certain chords will likely occur and a set of given chords implies an encompassing key in return. Consequently, a substantial part of this thesis is concerned with the question whether musicological knowledge can be embedded in a technical framework in such a way that it helps to improve the automatic recognition of keys and chords. The technical framework adopted in this thesis is built around a hidden Markov model (HMM). This facilitates an easy separation of the different aspects involved in the automatic recognition of keys and chords. Most experiments reviewed in the thesis focus on taking into account musicological knowledge about the musical context and about the expected chord duration. Technically speaking, this involves a manipulation of the transition probabilities in the HMMs. To account for the interaction between keys and chords, every HMM state is actually representing the combination of a key and a chord label. In the first part of the thesis, a number of alternatives for modelling the context are proposed. In particular, separate key change and chord change models are defined such that they closely mirror the way musicians conceive harmony. Multiple variants are considered that differ in the size of the context that is accounted for and in the knowledge source from which they were compiled. Some models are derived from a music corpus with key and chord notations whereas others follow directly from music theory. In the second part of the thesis, the contextual models are embedded in a system for automatic key and chord estimation. The features used in that system are so-called chroma profiles, which represent the saliences of the pitch classes in the audio signal. These chroma profiles are acoustically modelled by means of templates (idealised profiles) and a distance measure. In addition to these acoustic models and the contextual models developed in the first part, durational models are also required. The latter ensure that the chord and key estimations attain specified mean durations. The resulting system is then used to conduct experiments that provide more insight into how each system component contributes to the ultimate key and chord output quality. During the experimental study, the system complexity gets gradually increased, starting from a system containing only an acoustic model of the features that gets subsequently extended, first with duration models and afterwards with contextual models. The experiments show that taking into account the mean key and mean chord duration is essential to arrive at acceptable results for both key and chord estimation. The effect of using contextual information, however, is highly variable. On one hand, the chord change model has only a limited positive impact on the chord estimation accuracy (two to three percentage points), but this impact is fairly stable across different model variants. On the other hand, the chord change model has a much larger potential to improve the key output quality (up to seventeen percentage points), but only on the condition that the variant of the model is well adapted to the tested music material. Lastly, the key change model has only a negligible influence on the system performance. In the final part of this thesis, a couple of extensions to the formerly presented system are proposed and assessed. First, the global mean chord duration is replaced by key-chord specific values, which has a positive effect on the key estimation performance. Next, the HMM system is modified such that the prior chord duration distribution is no longer a geometric distribution but one that better approximates the observed durations in an appropriate data set. This modification leads to a small improvement of the chord estimation performance, but of course, it requires the availability of a suitable data set with chord notations from which to retrieve a target durational distribution. A final experiment demonstrates that increasing the scope of the contextual model only leads to statistically insignificant improvements. On top of that, the required computational load increases greatly

    A Data-Driven Model of Tonal Chord Sequence Complexity

    Get PDF

    From heuristics-based to data-driven audio melody extraction

    Get PDF
    The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

    Learning, Probability and Logic: Toward a Unified Approach for Content-Based Music Information Retrieval

    Get PDF
    Within the last 15 years, the field of Music Information Retrieval (MIR) has made tremendous progress in the development of algorithms for organizing and analyzing the ever-increasing large and varied amount of music and music-related data available digitally. However, the development of content-based methods to enable or ameliorate multimedia retrieval still remains a central challenge. In this perspective paper, we critically look at the problem of automatic chord estimation from audio recordings as a case study of content-based algorithms, and point out several bottlenecks in current approaches: expressiveness and flexibility are obtained to the expense of robustness and vice versa; available multimodal sources of information are little exploited; modeling multi-faceted and strongly interrelated musical information is limited with current architectures; models are typically restricted to short-term analysis that does not account for the hierarchical temporal structure of musical signals. Dealing with music data requires the ability to tackle both uncertainty and complex relational structure at multiple levels of representation. Traditional approaches have generally treated these two aspects separately, probability and learning being the usual way to represent uncertainty in knowledge, while logical representation being the usual way to represent knowledge and complex relational information. We advocate that the identified hurdles of current approaches could be overcome by recent developments in the area of Statistical Relational Artificial Intelligence (StarAI) that unifies probability, logic and (deep) learning. We show that existing approaches used in MIR find powerful extensions and unifications in StarAI, and we explain why we think it is time to consider the new perspectives offered by this promising research field

    Abcl: Abc music notation with rich chord support

    Get PDF
    It is well known the relevance of accompany chords but there is a lack of tools capable of automatically generating sound from them. In this paper we describe a domain specific language (Abcl) aimed to be a prototyping environment for new experimental music operators. Currently Abcl: (1) adds support for accompany chords (chordmode, instruments, chord-lines); (2) adds clearer support for percussion (drums, drum-machine) (3) adds a support for variables and functions. Abcl tool is a syntactic-preprocessor that produces Abc. The DSLToolkit, used to create Abcl, is also briefly presented and discussed in the paper.(undefined

    Singing information processing: techniques and applications

    Get PDF
    Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas. Por último, se propone un método para la corrección de disonancias en acordes aislados. Se basa en un análisis de múltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluación la ha realizado un grupo de músicos entrenados, y muestra un claro incremento de la consonancia percibida después de la transformación propuesta.La voz cantada es una componente esencial de la música en todas las culturas del mundo, ya que se trata de una forma increíblemente natural de expresión musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de técnicas y aplicaciones relacionadas con el procesado de voz cantada, así como con un repaso del estado del arte asociado en cada caso. En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperación por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraídos. Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, así como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores. En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos
    corecore