17 research outputs found
End-to-End Bayesian Segmentation and Similarity Assessment of Performed Music Tempo and Dynamics without Score Information
Segmenting continuous sensory input into coherent segments and subsegments is an important part of perception. Music is no exception. By shaping the acoustic properties of music during performance, musicians can strongly influence the perceived segmentation. Two main techniques musicians employ are the modulation of tempo and dynamics. Such variations carry important information for segmentation and lend themselves well to numerical analysis methods. In this article, based on tempo or loudness modulations alone, we propose a novel end-to-end Bayesian framework using dynamic programming to retrieve a musician's expressed segmentation. The method computes the credence of all possible segmentations of the recorded performance. The output is summarized in two forms: as a beat-by-beat profile revealing the posterior credence of plausible boundaries, and as expanded credence segment maps, a novel representation that converts readily to a segmentation lattice but retains information about the posterior uncertainty on the exact position of segments’ endpoints. To compare any two segmentation profiles, we introduce a method based on unbalanced optimal transport. Experimental results on the MazurkaBL dataset show that despite the drastic dimension reduction from the input data, the segmentation recovery is sufficient for deriving musical insights from comparative examination of recorded performances. This Bayesian segmentation method thus offers an alternative to binary boundary detection and finds multiple hypotheses fitting information from recorded music performances
A Model of Rhythm Transcription as Path Selection Through Approximate Common Divisor Graphs
We apply the concept of approximated common divisors (ACDs) to estimate the tempo and quantize the durations of a rhythmic sequence. The ACD models the duration of the tatum within the sequence, giving its rate in beats per minute. The rhythm input, a series of timestamps, is first split into overlapping frames. Then, we compute the possible ACDs that fit this frame and build a graph with the candidate ACDs as nodes. By building this graph, we transform the quantization problem into one of path selection, where the nodes represent the ACDs and determine the note values of the transcription and the edges represent tempo transitions between frames. A path through the graph thus corresponds to a rhythm transcription. For path selection, we present both an automated method using weights for evaluating the transcription and finding the shortest path, and an interactive approach that gives users the possibility of influencing the path selection
Semiotic Description of Music Structure: an Introduction to the Quaero/Metiss Structural Annotations
12 pagesInternational audienceInterest has been steadily growing in semantic audio and music information retrieval for the description of music structure, i.e., the global organization of music pieces in terms of large-scale structural units. This article presents a detailed methodology for the semiotic description of music structure, based on concepts and criteria which are formulated as generically as possible. We sum up the essential principles and practices developed during an annotation effort deployed by our research group (Metiss) on audio data in the context of the Quaero project, which has led to the public release of over 380 annotations of pop songs from three different data sets. The paper also includes a few case studies and a concise statistical overview of the annotated data
From DĂĽrer's Magic Square to Klumpenhouwer Tesseracts: On Melencolia (2013) by Philippe Manoury
Many Western art music composers have taken advantage of tabulated data for nourishing their creative practices, particularly since the early twentieth century. The arrival of atonality and serial techniques was crucial to this shift. Among the authors dealing with these kinds of tables, some have considered the singular mathematical properties of magic squares. This paper focuses on a particular case study in this sense: Philippe Manoury's Third String Quartet, entitled Melencolia. We mainly analyse mainly several strategies conceived by the French composer – through his own sketches – in order to manipulate pitches and pitch-classes over time. For that purpose, we take advantage of Klumpenhouwer networks as a way to settle wide and dense isographic relationships. Our hyper-K-nets sometimes reach a total of 32 arrows that allow geometrical arrangements as tesseracts in which their different dimensions cluster related families of isographies. In doing so, we aim to provide an instructive example of how to contextualise K-nets and isographies as powerful tools for the analysis of compositional practices
The Tonnetz Environment: A Web Platform for Computer-aided “Mathemusical” Learning and Research (versión preprint)
We describe the Tonnetz web environment and some of the possible applications we have developed within a pedagogical workshop on mathematics and music that has been conceived for high-school students. This web environment makes use of two geometrical representations that constitute intuitive ways of accessing some theoretical concepts underlying the equal tempered system and their possible mathematical formalizations. The environment is aimed at enhancing “mathemusical” learning processes by enabling the user to interactively manipulate these representations. Finally, we show how Tonnetz is currently being adapted in order to lead computer-based experiences in music perception and cognition that will be mainly carried at universities. These experiences will explore the way in which geometrical models could be implicitly encoded during the listening process. Their outcome may reinforce educational strategies for learning mathematics through music
Modèles de compression et critères de complexité pour la description et l'inférence de structure musicale
A very broad definition of music structure is to consider what distinguishes music from random noise as part of its structure. In this thesis, we take interest in the macroscopic aspects of music structure, especially the decomposition of musical pieces into autonomous segments (typically, sections) and their characterisation as the result of the grouping process of jointly compressible units. An important assumption of this work is to establish a link between the inference of music structure and information theory concepts such as complexity and entropy. We thus build upon the hypothesis that structural segments can be inferred through compression schemes. In a first part of this work, we study Straight-Line Grammars (SLGs), a family of formal grammars originally used for structure discovery in biological sequences (Gallé, 2011), and we explore their use for the modelisation of musical sequences. The SLG approach enables the compression of sequences, depending on their occurrence frequencies, resulting in a tree-based modelisation of their hierarchical organisation. We develop several adaptations of this method for the modelisation of approximate repetitions and we develop several regularity criteria aimed at improving the efficiency of the method. The second part of this thesis develops and explores a novel approach for the inference of music structure, based on the optimisation of a tensorial compression criterion. This approach aims to compress the musical information on several simultaneous time-scales by exploiting the similarity relations, the logical progressions and the analogy systems which are embedded in musical segments. The proposed method is first introduced from a formal point of view, then presented as a compression scheme rooted in a multi-scale extension of the System & Contrast model (Bimbot et al., 2012) to hypercubic tensorial patterns. Furthermore, we generalise the approach to other, irregular, tensorial patterns, in order to account for the great variety of structural organisations observed in musical segments. The methods presented in this thesis are tested on a structural segmentation task using symbolic data, chords sequences from pop music (RWC-Pop). The methods are evaluated and compared on several sets of chord sequences, and the results establish an experimental advantage for the approaches based on a complexity criterion for the analysis of structure in music information retrieval, with the best variants offering F-measure scores around 70%. To conclude this work, we recapitulate its main contributions and we discuss possible extensions of the studied paradigms, through their application to other musical dimensions, the inclusion of musicological knowledge, and their possible use on audio data.Une définition très générale de la structure musicale consiste à considérer tout ce qui distingue la musique d'un bruit aléatoire comme faisant partie de sa structure. Dans cette thèse, nous nous intéressons à l'aspect macroscopique de cette structure, en particulier la décomposition de passages musicaux en unités autonomes (typiquement, des sections) et à leur caractérisation en termes de groupements d'entités élémentaires conjointement compressibles. Un postulat de ce travail est d'établir un lien entre l'inférence de structure musicale et les concepts de complexité et d'entropie issus de la théorie de l'information. Nous travaillons ainsi à partir de l'hypothèse que les segments structurels peuvent être inférés par des schémas de compression de données. Dans une première partie, nous considérons les grammaires à dérivation unique (GDU), conçues à l'origine pour la découverte de structures répétitives dans les séquences biologiques (Gallé, 2011), dont nous explorons l'utilisation pour modéliser les séquences musicales. Cette approche permet de compresser les séquences en s'appuyant sur leurs statistiques d'apparition, leur organisation hiérarchique étant modélisée sous forme arborescente. Nous développons plusieurs adaptations de cette méthode pour modéliser des répétitions inexactes et nous présentons l'étude de plusieurs critères visant à régulariser les solutions obtenues. La seconde partie de cette thèse développe et explore une approche novatrice d'inférence de structure musicale basée sur l'optimisation d'un critère de compression tensorielle. Celui-ci vise à compresser l'information musicale sur plusieurs échelles simultanément en exploitant les relations de similarité, les progressions logiques et les systèmes d'analogie présents dans les segments musicaux. La méthode proposée est introduite d'un point de vue formel, puis présentée comme un schéma de compression s'appuyant sur une extension multi-échelle du modèle Système & Contraste (Bimbot et al., 2012) à des patrons tensoriels hypercubiques. Nous généralisons de surcroît l'approche à d'autres patrons tensoriels, irréguliers, afin de rendre compte de la grande variété d'organisations structurelles des segments musicaux. Les méthodes étudiées dans cette thèse sont expérimentées sur une tâche de segmentation structurelle de données symboliques correspondant à des séquences d'accords issues de morceaux de musique pop (RWC-Pop). Les méthodes sont évaluées et comparées sur plusieurs types de séquences d'accords, et les résultats établissent l'attractivité des approches par critère de complexité pour l'analyse de structure et la recherche d'informations musicales, les meilleures variantes fournissant des performances de l'ordre de 70% de F-mesure
Dictionary Learning for Audio Inpainting
Recordings of audio often show undesirable alterations, mostly the presence of noise or the corruption of short parts. Clipping, or saturation, is one of such alterations. Several techniques have been developed in order to attempt the reversal of this corruption, achieving good but perfectible results. One of these techniques, developed in the METISS project-team, involves the use of sparse representations, a popular model in signal processing. The principle of sparse representations is to describe a high-dimensional data vector as a linear combination of a few prototype vectors, called atoms, selected from a large corpus called the dictionary. Building upon this technique, the aim of this internship is to define if and how can machine learning be applied on the dictionary in order to further enhance the results: what to learn on, with what learning algorithm, and with what kind of signals does it work