732 research outputs found
Drum Transcription via Classification of Bar-level Rhythmic Patterns
acceptedMatthias Mauch is supported by a Royal Academy of Engineering
Research Fellowshi
A simulated annealing optimization of audio features for drum classification
Current methods for the accurate recognition of instruments within music are based on discriminative data descriptors. These are features of the music fragment that capture the characteristics of the audio and suppress details that are redundant for the problem at hand. The extraction of such features from an audio signal requires the user to set certain parameters. We propose a method for optimizing the parameters for a particular task on the basis of the Simulated Annealing algorithm and Support Vector Machine classification. We show that using an optimized set of audio features improves the recognition accuracy of drum sounds in music fragments
The drum kit and the studio : a spectral and dynamic analysis of the relevant components
The research emerged from the need to understand how engineers perceive and record drum
kits in modern popular music. We performed a preliminary, exploratory analysis of
behavioural aspects in drum kit samples. We searched for similarities and differences, hoping
to achieve further understanding of the sonic relationship the instrument shares with others, as
well as its involvement in music making.
Methodologically, this study adopts a pragmatic analysis of audio contents, extraction of
values and comparison of results. We used two methods to analyse the data. The first, a
generalised approach, was an individual analysis of each sample in the chosen eight classes
(composed of common elements in modern drum kits). The second focused on a single
sample that resulted from the down-mix of the previous classes’ sample pools.
For the analysis, we handpicked several subjective and objective features as well as a series of
low-level audio descriptors that hold information regarding the dynamic and frequency
contents of the audio samples. We then conducted a series of processes, which included visual
analysis of three-dimensional graphics and software-based information computing, to retrieve
the analytical data.
Results showed that there are some significant similarities among the classes’ audio features.
This led to the assumption that the a priori experience of engineers could, in fact, be a
collective and subconscious notion, instinctively achieved in a recording session.
In fact, with more research concerning this subject, one may even find new a new way to deal
with drum kits in a studio context, hastening time-consuming processes and strenuous tasks
that are common when doing so.A investigação científica realizada no ramo do áudio e da música tornou-se abastada e
prolífica, exibindo estudos com alto teor informativo para melhor compreensão das diferentes
áreas de incidência.
Muita da pesquisa desenvolvida foca-se em aspectos pragmáticos: reconhecimento de voz e
de padrão, recuperação de informação musical, sistemas de mistura inteligente, entre outros.
No entanto, embora estes sejam aspectos formais de elevada importância, tem-se notado uma
latente falta de documentação relativa a aspectos mais idílicos e artísticos.
O instrumento musical de estudo que escolhemos foi a bateria. Para além de uma vontade
pessoal de entender a plenitude das suas características sónicas intrínsecas para aplicações
prácticas com resultados tangíveis, é de notar a ausência de discurso e pesquisa científica que
por este caminho se tenha aventurado.
Não obstante, a bateria tem sido objecto de estudo profundo em contextos analíticos, motivo
pelo qual foi também relevante originar a nossa abordagem seminal. Por um lado, as questões
físicas de construção e manutenção de baterias, bem como aspectos de índole ambiental e de
espaço (salas de gravação) são dos aspectos que mais efeitos produzem na diferença timbríca
em múltiplos exemplos de gravações de baterias. No entanto, questões tonais (fundamentais
para uma pluralidade de instrumentos) na bateria carecem de estudo e documentação num
contexto mundial generalizado.
São muitos os engenheiros de som e músicos que alimentam a ideia preconcebida da
dificuldade inerente em relacionar este elemento percursivo com os restantes instrumentos
numa música. Aliam-se a isto questões subjectivas de gosto e preferência, bem como outros
métodos que facilitam a inserção de um instrumento rítmico e semi-harmónico (porque é
possível escolher uma afinação para diferentes elementos de uma bateria) numa textura
sonora que remete para diferentes conceitos musicais.
Portanto, a questão nuclear que este estudo se foca é: “será possível atingir um som idílico
nos diferentes elementos de uma bateria?”. Em si só, a ambiguidade desta resposta pode
remeter para um conceito dogmático e inflexível, bem como para a ideia de que, até ao
momento, nenhuma gravação ou som de bateria alcançou um patamar de extrema qualidade,
sonoridade ou ubiquidade que a responda a esta premissa.
Partimos, então, desta interrogação e procedemos a uma análise pragmática de amostras
sonoras que fossem o mais assimiláveis possível a um contexto comercial. Reunimos
amostras de oito classes pré-definidas: bombos, tarolas, pratos de choque, timbalões graves,
médios e agudos, crashs e rides. As amostras derivaram de bibliotecas que foram reunidas
posteriormente à realização de uma pesquisa em busca dos fabricantes mais conceituados,
com maior adesão pública e com antecedentes comerciais tangíveis. Daqui recuperamos 481
amostras.
Depois de reunidas, as amostras sofreram um processo de identificação e catalogação,
passando também por alguns momentos de processamento de sinal (conversão para ficheiros
monofónicos, igualização da duração e normalização do pico de sinal). Em seguida, através
do software de computação matemática MATLAB, desenvolvemos linhas de código que
foram instrumentais para fase da análise de características e descritores de ficheiros áudio.
Finalmente, procedemos a uma reunião dos resultados obtidos e a iniciação de suposições que
pudessem originar os valores extraídos. De entre os resultados obtidos, surgiram ideias que, com mais investigação, podem facilitar a
compreensão do comportamento sonoro dos diferentes elementos, bem como a criação de
métodos de conjugação harmónica entre eles.
É importante referir que, neste estudo, partimos de um conceito qualitativo do som, e como
tal, omitimos aspectos físicos que, na sua essência, influenciam substancialmente o som que é
emitido. No entanto, este trabalho introdutório pretende retificar de forma preliminar esta falta
de conceitos subjectivos com evidências palpáveis. Evidências essas que ainda necessitam de
investigação adicional para a sua confirmação
Automatic classification of drum sounds with indefinite pitch
Automatic classification of musical instruments is an important task for music transcription as well as for professionals such as audio designers, engineers and musicians. Unfortunately, only a limited amount of effort has been conducted to automatically classify percussion instrument in the last years. The studies that deal with percussion sounds are usually restricted to distinguish among the instruments in the drum kit such as toms vs. snare drum vs. bass drum vs. cymbals. In this paper, we are interested in a more challenging task of discriminating sounds produced by the same percussion instrument. Specifically, sounds from different drums cymbals types. Cymbals are known to have indefinite pitch, nonlinear and chaotic behavior. We also identify how the sound of a specific cymbal was produced (e.g., roll or choke movements performed by a drummer). We achieve an accuracy of 96.59% for cymbal type classification and 91.54% in a classification problem with 12 classes which represent the cymbal type and the manner or region that the cymbals are struck. Both results were obtained with Support Vector Machine algorithm using the Line Spectral Frequencies as audio descriptor. We believe that our results can be useful for a more detailed automatic drum transcription and for other related applications as well for audio professionals.Fundação de Amparo a Pesquisa e Desenvolvimento do Estado de São Paulo (FAPESP) (grants 2011/17698-5
Automatic cymbal classification
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de
Lisboa para a obtenção do grau de Mestre em Engenharia InformáticaMost of the research on automatic music transcription is focused on the transcription of pitched instruments, like the guitar and the piano. Little attention has been given to unpitched instruments, such as the drum kit, which is a collection of unpitched instruments. Yet, over the last few years this type of instrument started to garner more attention, perhaps due to increasing popularity of the drum kit in the western music.
There has been work on automatic music transcription of the drum kit, especially the snare drum, bass drum, and hi-hat. Still, much work has to be done in order to achieve automatic music transcription of all unpitched instruments. An example of a type of unpitched instrument that has very particular acoustic characteristics and that has deserved almost no attention by the research community is the drum kit cymbals.
A drum kit contains several cymbals and usually these are treated as a single instrument or are totally disregarded by automatic music classificators of unpitched instruments. We propose to fill this gap and as such, the goal of this dissertation is automatic music classification of drum kit cymbal events, and the identification of which class of cymbals they belong to.
As stated, the majority of work developed on this area is mostly done with very different percussive instruments, like the snare drum, bass drum, and hi-hat. On the other hand, cymbals are very similar between them. Their geometry, type of alloys, spectral and sound traits shows us just that. Thus, the great achievement of this work is not only being able to correctly classify the different cymbals, but to be able to identify such similar instruments, which makes this task even harder
A latent rhythm complexity model for attribute-controlled drum pattern generation
AbstractMost music listeners have an intuitive understanding of the notion of rhythm complexity. Musicologists and scientists, however, have long sought objective ways to measure and model such a distinctively perceptual attribute of music. Whereas previous research has mainly focused on monophonic patterns, this article presents a novel perceptually-informed rhythm complexity measure specifically designed for polyphonic rhythms, i.e., patterns in which multiple simultaneous voices cooperate toward creating a coherent musical phrase. We focus on drum rhythms relating to the Western musical tradition and validate the proposed measure through a perceptual test where users were asked to rate the complexity of real-life drumming performances. Hence, we propose a latent vector model for rhythm complexity based on a recurrent variational autoencoder tasked with learning the complexity of input samples and embedding it along one latent dimension. Aided by an auxiliary adversarial loss term promoting disentanglement, this effectively regularizes the latent space, thus enabling explicit control over the complexity of newly generated patterns. Trained on a large corpus of MIDI files of polyphonic drum recordings, the proposed method proved capable of generating coherent and realistic samples at the desired complexity value. In our experiments, output and target complexities show a high correlation, and the latent space appears interpretable and continuously navigable. On the one hand, this model can readily contribute to a wide range of creative applications, including, for instance, assisted music composition and automatic music generation. On the other hand, it brings us one step closer toward achieving the ambitious goal of equipping machines with a human-like understanding of perceptual features of music
Recommended from our members
Automatic sound synthesizer programming: techniques and applications
The aim of this thesis is to investigate techniques for, and applications of automatic sound synthesizer programming. An automatic sound synthesizer programmer is a system which removes the requirement to explicitly specify parameter settings for a sound synthesis algorithm from the user. Two forms of these systems are discussed in this thesis:
tone matching programmers and synthesis space explorers. A tone matching programmer takes at its input a sound synthesis algorithm and a desired target sound. At its output it produces a configuration for the sound synthesis algorithm which causes it to emit a
similar sound to the target. The techniques for achieving this that are investigated are
genetic algorithms, neural networks, hill climbers and data driven approaches. A synthesis
space explorer provides a user with a representation of the space of possible sounds
that a synthesizer can produce and allows them to interactively explore this space. The
applications of automatic sound synthesizer programming that are investigated include
studio tools, an autonomous musical agent and a self-reprogramming drum machine. The
research employs several methodologies: the development of novel software frameworks
and tools, the examination of existing software at the source code and performance levels
and user trials of the tools and software. The main contributions made are: a method
for visualisation of sound synthesis space and low dimensional control of sound synthesizers; a general purpose framework for the deployment and testing of sound synthesis and optimisation algorithms in the SuperCollider language sclang; a comparison of a variety of optimisation techniques for sound synthesizer programming; an analysis of sound synthesizer error surfaces; a general purpose sound synthesizer programmer compatible with industry standard tools; an automatic improviser which passes a loose equivalent of the Turing test for Jazz musicians, i.e. being half of a man-machine duet which was rated as one of the best sessions of 2009 on the BBC's 'Jazz on 3' programme
Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data
Sound event detection (SED) aims to detect when and recognize what sound
events happen in an audio clip. Many supervised SED algorithms rely on strongly
labelled data which contains the onset and offset annotations of sound events.
However, many audio tagging datasets are weakly labelled, that is, only the
presence of the sound events is known, without knowing their onset and offset
annotations. In this paper, we propose a time-frequency (T-F) segmentation
framework trained on weakly labelled data to tackle the sound event detection
and separation problem. In training, a segmentation mapping is applied on a T-F
representation, such as log mel spectrogram of an audio clip to obtain T-F
segmentation masks of sound events. The T-F segmentation masks can be used for
separating the sound events from the background scenes in the time-frequency
domain. Then a classification mapping is applied on the T-F segmentation masks
to estimate the presence probabilities of the sound events. We model the
segmentation mapping using a convolutional neural network and the
classification mapping using a global weighted rank pooling (GWRP). In SED,
predicted onset and offset times can be obtained from the T-F segmentation
masks. As a byproduct, separated waveforms of sound events can be obtained from
the T-F segmentation masks. We remixed the DCASE 2018 Task 1 acoustic scene
data with the DCASE 2018 Task 2 sound events data. When mixing under 0 dB, the
proposed method achieved F1 scores of 0.534, 0.398 and 0.167 in audio tagging,
frame-wise SED and event-wise SED, outperforming the fully connected deep
neural network baseline of 0.331, 0.237 and 0.120, respectively. In T-F
segmentation, we achieved an F1 score of 0.218, where previous methods were not
able to do T-F segmentation.Comment: 12 pages, 8 figure
- …