309 research outputs found
Unpitched percussion transcription in audio signals
Die Forschung im Bereich der inhaltsbasierten Beschreibung von Musik hat an Bedeutung gewonnen, seitdem die Menge digital erhältlicher Musik unüberschaubar geworden ist. Eines der interessantesten und schwierigsten Probleme, unter der Vielzahl von Disziplinen in diesem Feld, ist das der Musiktranskription. Dieser Begriff bezeichnet im weitesten Sinn die zeitliche Erkennung musikalischer Ereignisse und die Benennung der daran beteiligten Instrumente. Bisherige Forschung in diesem Bereich konzentrierte sich hauptschlich auf die Extraktion von melodischen und tonalen Informationen, bis vor kurzem der Extraktion von rhythmischen Strukturen derselbe Stellenwert beigemessen wurde. Da Schlaginstrumente das rhythmische Rückgrat eines musikalischen Stückes bilden, spielt deren Transkription eine entscheidende Rolle bei der Darstellung und dem Verständnis von Musik.
Diese Masterarbeit beschreibt angewendete Signalverarbeitungstechniken im Bereich der Transkription von Schlaginstrumenten und stellt ein Vorlagen-basiertes Verfahren im Detail vor, das im Verlauf dieser Masterarbeit implementiert und getestet worden ist.Content-based description of music has become a significant research topic, since technological advances let the amount of digitally available music explode. One of the most interesting and challenging problems, among the wide range of disciplines in this field, is that of music transcription. The term transcription refers to the task of estimating the temporal locations of sound events and recognising the instruments which have been used to produce them. Research in this discipline primarily focused on the extraction of melodic and tonal information, until more recently the extraction of rhythmic structures received the same degree of attention. As percussive instruments form the rhythmic backbone of a musical piece, their transcription is a key component in representing and understanding music.
This thesis explores state of the art signal processing techniques that have found application in percussion transcription and describes a template-matching-based transcription system in more detail, which has been implemented and evaluated in the course of this thesis
An review of automatic drum transcription
In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription(ADT).This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Nonnegative Matrix Factorization and Recurrent Neural Networks. We explain the methods’ technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this fiel
Score-Informed Source Separation for Music Signals
In recent years, the processing of audio recordings by exploiting additional musical knowledge has turned out to be a promising research direction. In particular, additional note information as specified by a musical score or a MIDI file has been employed to support various audio processing tasks such as source separation, audio parameterization, performance analysis, or instrument equalization. In this contribution, we provide an overview of approaches for score-informed source separation and illustrate their potential by discussing innovative applications and interfaces. Additionally, to illustrate some basic principles behind these approaches, we demonstrate how score information can be integrated into the well-known non-negative matrix factorization (NMF) framework. Finally, we compare this approach to advanced methods based on parametric models
Automatic classification of drum sounds with indefinite pitch
Automatic classification of musical instruments is an important task for music transcription as well as for professionals such as audio designers, engineers and musicians. Unfortunately, only a limited amount of effort has been conducted to automatically classify percussion instrument in the last years. The studies that deal with percussion sounds are usually restricted to distinguish among the instruments in the drum kit such as toms vs. snare drum vs. bass drum vs. cymbals. In this paper, we are interested in a more challenging task of discriminating sounds produced by the same percussion instrument. Specifically, sounds from different drums cymbals types. Cymbals are known to have indefinite pitch, nonlinear and chaotic behavior. We also identify how the sound of a specific cymbal was produced (e.g., roll or choke movements performed by a drummer). We achieve an accuracy of 96.59% for cymbal type classification and 91.54% in a classification problem with 12 classes which represent the cymbal type and the manner or region that the cymbals are struck. Both results were obtained with Support Vector Machine algorithm using the Line Spectral Frequencies as audio descriptor. We believe that our results can be useful for a more detailed automatic drum transcription and for other related applications as well for audio professionals.Fundação de Amparo a Pesquisa e Desenvolvimento do Estado de São Paulo (FAPESP) (grants 2011/17698-5
Interactive real-time musical systems
PhDThis thesis focuses on the development of automatic accompaniment systems.
We investigate previous systems and look at a range of approaches
that have been attempted for the problem of beat tracking. Most beat
trackers are intended for the purposes of music information retrieval where
a `black box' approach is tested on a wide variety of music genres. We
highlight some of the diffculties facing offline beat trackers and design a
new approach for the problem of real-time drum tracking, developing a
system, B-Keeper, which makes reasonable assumptions on the nature of
the signal and is provided with useful prior knowledge.
Having developed the system with offline studio recordings, we look to
test the system with human players. Existing offline evaluation methods
seem less suitable for a performance system, since we also wish to evaluate
the interaction between musician and machine. Although statistical data
may reveal quantifiable measurements of the system's predictions and behaviour,
we also want to test how well it functions within the context of a
live performance. To do so, we devise an evaluation strategy to contrast
a machine-controlled accompaniment with one controlled by a human.
We also present recent work on a real-time multiple pitch tracking,
which is then extended to provide automatic accompaniment for harmonic
instruments such as guitar. By aligning salient notes in the output from
a dual pitch tracking process, we make changes to the tempo of the
accompaniment in order to align it with a live stream. By demonstrating
the system's ability to align offline tracks, we can show that under
restricted initial conditions, the algorithm works well as an alignment tool
Signal Processing Methods for Music Synchronization, Audio Matching, and Source Separation
The field of music information retrieval (MIR) aims at developing techniques and tools for organizing, understanding, and searching multimodal information in large music collections in a robust, efficient and intelligent manner. In this context, this thesis presents novel, content-based methods for music synchronization, audio matching, and source separation. In general, music synchronization denotes a procedure which, for a given position in one representation of a piece of music, determines the corresponding position within another representation. Here, the thesis presents three complementary synchronization approaches, which improve upon previous methods in terms of robustness, reliability, and accuracy. The first approach employs a late-fusion strategy based on multiple, conceptually different alignment techniques to identify those music passages that allow for reliable alignment results. The second approach is based on the idea of employing musical structure analysis methods in the context of synchronization to derive reliable synchronization results even in the presence of structural differences between the versions to be aligned. Finally, the third approach employs several complementary strategies for increasing the accuracy and time resolution of synchronization results. Given a short query audio clip, the goal of audio matching is to automatically retrieve all musically similar excerpts in different versions and arrangements of the same underlying piece of music. In this context, chroma-based audio features are a well-established tool as they possess a high degree of invariance to variations in timbre. This thesis describes a novel procedure for making chroma features even more robust to changes in timbre while keeping their discriminative power. Here, the idea is to identify and discard timbre-related information using techniques inspired by the well-known MFCC features, which are usually employed in speech processing. Given a monaural music recording, the goal of source separation is to extract musically meaningful sound sources corresponding, for example, to a melody, an instrument, or a drum track from the recording. To facilitate this complex task, one can exploit additional information provided by a musical score. Based on this idea, this thesis presents two novel, conceptually different approaches to source separation. Using score information provided by a given MIDI file, the first approach employs a parametric model to describe a given audio recording of a piece of music. The resulting model is then used to extract sound sources as specified by the score. As a computationally less demanding and easier to implement alternative, the second approach employs the additional score information to guide a decomposition based on non-negative matrix factorization (NMF)
Audio source separation for music in low-latency and high-latency scenarios
Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals
A User-assisted Approach to Multiple Instrument Music Transcription
PhDThe task of automatic music transcription has been studied for several decades
and is regarded as an enabling technology for a multitude of applications such
as music retrieval and discovery, intelligent music processing and large-scale
musicological analyses. It refers to the process of identifying the musical content
of a performance and representing it in a symbolic format. Despite its long
research history, fully automatic music transcription systems are still error prone
and often fail when more complex polyphonic music is analysed. This gives
rise to the question in what ways human knowledge can be incorporated in the
transcription process.
This thesis investigates ways to involve a human user in the transcription
process. More specifically, it is investigated how user input can be employed
to derive timbre models for the instruments in a music recording, which are
employed to obtain instrument-specific (parts-based) transcriptions.
A first investigation studies different types of user input in order to derive
instrument models by means of a non-negative matrix factorisation framework.
The transcription accuracy of the different models is evaluated and a method is
proposed that refines the models by allowing each pitch of each instrument to
be represented by multiple basis functions.
A second study aims at limiting the amount of user input to make the
method more applicable in practice. Different methods are considered to estimate
missing non-negative basis functions when only a subset of basis functions can
be extracted based on the user information.
A method is proposed to track the pitches of individual instruments over time
by means of a Viterbi framework in which the states at each time frame contain
several candidate instrument-pitch combinations. A transition probability is
employed that combines three different criteria: the frame-wise reconstruction
error of each combination, a pitch continuity measure that favours similar pitches
in consecutive frames, and an explicit activity model for each instrument. The
method is shown to outperform other state-of-the-art multi-instrument tracking
methods.
Finally, the extraction of instrument models that include phase information
is investigated as a step towards complex matrix decomposition. The phase
relations between the partials of harmonic sounds are explored as a time-invariant
property that can be employed to form complex-valued basis functions. The
application of the model for a user-assisted transcription task is illustrated with a saxophone example.QMU
Separation of musical sources and structure from single-channel polyphonic recordings
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Automatic annotation of musical audio for interactive applications
PhDAs machines become more and more portable, and part of our everyday life, it becomes
apparent that developing interactive and ubiquitous systems is an important
aspect of new music applications created by the research community. We are interested
in developing a robust layer for the automatic annotation of audio signals, to
be used in various applications, from music search engines to interactive installations,
and in various contexts, from embedded devices to audio content servers. We
propose adaptations of existing signal processing techniques to a real time context.
Amongst these annotation techniques, we concentrate on low and mid-level tasks
such as onset detection, pitch tracking, tempo extraction and note modelling. We
present a framework to extract these annotations and evaluate the performances of
different algorithms.
The first task is to detect onsets and offsets in audio streams within short latencies.
The segmentation of audio streams into temporal objects enables various
manipulation and analysis of metrical structure. Evaluation of different algorithms
and their adaptation to real time are described. We then tackle the problem of
fundamental frequency estimation, again trying to reduce both the delay and the
computational cost. Different algorithms are implemented for real time and experimented
on monophonic recordings and complex signals. Spectral analysis can be
used to label the temporal segments; the estimation of higher level descriptions is
approached. Techniques for modelling of note objects and localisation of beats are
implemented and discussed.
Applications of our framework include live and interactive music installations,
and more generally tools for the composers and sound engineers. Speed optimisations
may bring a significant improvement to various automated tasks, such as
automatic classification and recommendation systems. We describe the design of
our software solution, for our research purposes and in view of its integration within
other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music
Audio Contents);
EPSRC grants GR/R54620; GR/S75802/01
- …