732 research outputs found

    Drum Transcription via Classification of Bar-level Rhythmic Patterns

    Get PDF
    acceptedMatthias Mauch is supported by a Royal Academy of Engineering Research Fellowshi

    A simulated annealing optimization of audio features for drum classification

    Get PDF
    Current methods for the accurate recognition of instruments within music are based on discriminative data descriptors. These are features of the music fragment that capture the characteristics of the audio and suppress details that are redundant for the problem at hand. The extraction of such features from an audio signal requires the user to set certain parameters. We propose a method for optimizing the parameters for a particular task on the basis of the Simulated Annealing algorithm and Support Vector Machine classification. We show that using an optimized set of audio features improves the recognition accuracy of drum sounds in music fragments

    The drum kit and the studio : a spectral and dynamic analysis of the relevant components

    Get PDF
    The research emerged from the need to understand how engineers perceive and record drum kits in modern popular music. We performed a preliminary, exploratory analysis of behavioural aspects in drum kit samples. We searched for similarities and differences, hoping to achieve further understanding of the sonic relationship the instrument shares with others, as well as its involvement in music making. Methodologically, this study adopts a pragmatic analysis of audio contents, extraction of values and comparison of results. We used two methods to analyse the data. The first, a generalised approach, was an individual analysis of each sample in the chosen eight classes (composed of common elements in modern drum kits). The second focused on a single sample that resulted from the down-mix of the previous classes’ sample pools. For the analysis, we handpicked several subjective and objective features as well as a series of low-level audio descriptors that hold information regarding the dynamic and frequency contents of the audio samples. We then conducted a series of processes, which included visual analysis of three-dimensional graphics and software-based information computing, to retrieve the analytical data. Results showed that there are some significant similarities among the classes’ audio features. This led to the assumption that the a priori experience of engineers could, in fact, be a collective and subconscious notion, instinctively achieved in a recording session. In fact, with more research concerning this subject, one may even find new a new way to deal with drum kits in a studio context, hastening time-consuming processes and strenuous tasks that are common when doing so.A investigação científica realizada no ramo do áudio e da música tornou-se abastada e prolífica, exibindo estudos com alto teor informativo para melhor compreensão das diferentes áreas de incidência. Muita da pesquisa desenvolvida foca-se em aspectos pragmáticos: reconhecimento de voz e de padrão, recuperação de informação musical, sistemas de mistura inteligente, entre outros. No entanto, embora estes sejam aspectos formais de elevada importância, tem-se notado uma latente falta de documentação relativa a aspectos mais idílicos e artísticos. O instrumento musical de estudo que escolhemos foi a bateria. Para além de uma vontade pessoal de entender a plenitude das suas características sónicas intrínsecas para aplicações prácticas com resultados tangíveis, é de notar a ausência de discurso e pesquisa científica que por este caminho se tenha aventurado. Não obstante, a bateria tem sido objecto de estudo profundo em contextos analíticos, motivo pelo qual foi também relevante originar a nossa abordagem seminal. Por um lado, as questões físicas de construção e manutenção de baterias, bem como aspectos de índole ambiental e de espaço (salas de gravação) são dos aspectos que mais efeitos produzem na diferença timbríca em múltiplos exemplos de gravações de baterias. No entanto, questões tonais (fundamentais para uma pluralidade de instrumentos) na bateria carecem de estudo e documentação num contexto mundial generalizado. São muitos os engenheiros de som e músicos que alimentam a ideia preconcebida da dificuldade inerente em relacionar este elemento percursivo com os restantes instrumentos numa música. Aliam-se a isto questões subjectivas de gosto e preferência, bem como outros métodos que facilitam a inserção de um instrumento rítmico e semi-harmónico (porque é possível escolher uma afinação para diferentes elementos de uma bateria) numa textura sonora que remete para diferentes conceitos musicais. Portanto, a questão nuclear que este estudo se foca é: “será possível atingir um som idílico nos diferentes elementos de uma bateria?”. Em si só, a ambiguidade desta resposta pode remeter para um conceito dogmático e inflexível, bem como para a ideia de que, até ao momento, nenhuma gravação ou som de bateria alcançou um patamar de extrema qualidade, sonoridade ou ubiquidade que a responda a esta premissa. Partimos, então, desta interrogação e procedemos a uma análise pragmática de amostras sonoras que fossem o mais assimiláveis possível a um contexto comercial. Reunimos amostras de oito classes pré-definidas: bombos, tarolas, pratos de choque, timbalões graves, médios e agudos, crashs e rides. As amostras derivaram de bibliotecas que foram reunidas posteriormente à realização de uma pesquisa em busca dos fabricantes mais conceituados, com maior adesão pública e com antecedentes comerciais tangíveis. Daqui recuperamos 481 amostras. Depois de reunidas, as amostras sofreram um processo de identificação e catalogação, passando também por alguns momentos de processamento de sinal (conversão para ficheiros monofónicos, igualização da duração e normalização do pico de sinal). Em seguida, através do software de computação matemática MATLAB, desenvolvemos linhas de código que foram instrumentais para fase da análise de características e descritores de ficheiros áudio. Finalmente, procedemos a uma reunião dos resultados obtidos e a iniciação de suposições que pudessem originar os valores extraídos. De entre os resultados obtidos, surgiram ideias que, com mais investigação, podem facilitar a compreensão do comportamento sonoro dos diferentes elementos, bem como a criação de métodos de conjugação harmónica entre eles. É importante referir que, neste estudo, partimos de um conceito qualitativo do som, e como tal, omitimos aspectos físicos que, na sua essência, influenciam substancialmente o som que é emitido. No entanto, este trabalho introdutório pretende retificar de forma preliminar esta falta de conceitos subjectivos com evidências palpáveis. Evidências essas que ainda necessitam de investigação adicional para a sua confirmação

    Automatic classification of drum sounds with indefinite pitch

    Get PDF
    Automatic classification of musical instruments is an important task for music transcription as well as for professionals such as audio designers, engineers and musicians. Unfortunately, only a limited amount of effort has been conducted to automatically classify percussion instrument in the last years. The studies that deal with percussion sounds are usually restricted to distinguish among the instruments in the drum kit such as toms vs. snare drum vs. bass drum vs. cymbals. In this paper, we are interested in a more challenging task of discriminating sounds produced by the same percussion instrument. Specifically, sounds from different drums cymbals types. Cymbals are known to have indefinite pitch, nonlinear and chaotic behavior. We also identify how the sound of a specific cymbal was produced (e.g., roll or choke movements performed by a drummer). We achieve an accuracy of 96.59% for cymbal type classification and 91.54% in a classification problem with 12 classes which represent the cymbal type and the manner or region that the cymbals are struck. Both results were obtained with Support Vector Machine algorithm using the Line Spectral Frequencies as audio descriptor. We believe that our results can be useful for a more detailed automatic drum transcription and for other related applications as well for audio professionals.Fundação de Amparo a Pesquisa e Desenvolvimento do Estado de São Paulo (FAPESP) (grants 2011/17698-5

    Automatic cymbal classification

    Get PDF
    Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia InformáticaMost of the research on automatic music transcription is focused on the transcription of pitched instruments, like the guitar and the piano. Little attention has been given to unpitched instruments, such as the drum kit, which is a collection of unpitched instruments. Yet, over the last few years this type of instrument started to garner more attention, perhaps due to increasing popularity of the drum kit in the western music. There has been work on automatic music transcription of the drum kit, especially the snare drum, bass drum, and hi-hat. Still, much work has to be done in order to achieve automatic music transcription of all unpitched instruments. An example of a type of unpitched instrument that has very particular acoustic characteristics and that has deserved almost no attention by the research community is the drum kit cymbals. A drum kit contains several cymbals and usually these are treated as a single instrument or are totally disregarded by automatic music classificators of unpitched instruments. We propose to fill this gap and as such, the goal of this dissertation is automatic music classification of drum kit cymbal events, and the identification of which class of cymbals they belong to. As stated, the majority of work developed on this area is mostly done with very different percussive instruments, like the snare drum, bass drum, and hi-hat. On the other hand, cymbals are very similar between them. Their geometry, type of alloys, spectral and sound traits shows us just that. Thus, the great achievement of this work is not only being able to correctly classify the different cymbals, but to be able to identify such similar instruments, which makes this task even harder

    A latent rhythm complexity model for attribute-controlled drum pattern generation

    Get PDF
    AbstractMost music listeners have an intuitive understanding of the notion of rhythm complexity. Musicologists and scientists, however, have long sought objective ways to measure and model such a distinctively perceptual attribute of music. Whereas previous research has mainly focused on monophonic patterns, this article presents a novel perceptually-informed rhythm complexity measure specifically designed for polyphonic rhythms, i.e., patterns in which multiple simultaneous voices cooperate toward creating a coherent musical phrase. We focus on drum rhythms relating to the Western musical tradition and validate the proposed measure through a perceptual test where users were asked to rate the complexity of real-life drumming performances. Hence, we propose a latent vector model for rhythm complexity based on a recurrent variational autoencoder tasked with learning the complexity of input samples and embedding it along one latent dimension. Aided by an auxiliary adversarial loss term promoting disentanglement, this effectively regularizes the latent space, thus enabling explicit control over the complexity of newly generated patterns. Trained on a large corpus of MIDI files of polyphonic drum recordings, the proposed method proved capable of generating coherent and realistic samples at the desired complexity value. In our experiments, output and target complexities show a high correlation, and the latent space appears interpretable and continuously navigable. On the one hand, this model can readily contribute to a wide range of creative applications, including, for instance, assisted music composition and automatic music generation. On the other hand, it brings us one step closer toward achieving the ambitious goal of equipping machines with a human-like understanding of perceptual features of music

    Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

    Get PDF
    Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip. Many supervised SED algorithms rely on strongly labelled data which contains the onset and offset annotations of sound events. However, many audio tagging datasets are weakly labelled, that is, only the presence of the sound events is known, without knowing their onset and offset annotations. In this paper, we propose a time-frequency (T-F) segmentation framework trained on weakly labelled data to tackle the sound event detection and separation problem. In training, a segmentation mapping is applied on a T-F representation, such as log mel spectrogram of an audio clip to obtain T-F segmentation masks of sound events. The T-F segmentation masks can be used for separating the sound events from the background scenes in the time-frequency domain. Then a classification mapping is applied on the T-F segmentation masks to estimate the presence probabilities of the sound events. We model the segmentation mapping using a convolutional neural network and the classification mapping using a global weighted rank pooling (GWRP). In SED, predicted onset and offset times can be obtained from the T-F segmentation masks. As a byproduct, separated waveforms of sound events can be obtained from the T-F segmentation masks. We remixed the DCASE 2018 Task 1 acoustic scene data with the DCASE 2018 Task 2 sound events data. When mixing under 0 dB, the proposed method achieved F1 scores of 0.534, 0.398 and 0.167 in audio tagging, frame-wise SED and event-wise SED, outperforming the fully connected deep neural network baseline of 0.331, 0.237 and 0.120, respectively. In T-F segmentation, we achieved an F1 score of 0.218, where previous methods were not able to do T-F segmentation.Comment: 12 pages, 8 figure
    corecore