1,297 research outputs found

    Automatic classification of drum sounds with indefinite pitch

    Get PDF
    Automatic classification of musical instruments is an important task for music transcription as well as for professionals such as audio designers, engineers and musicians. Unfortunately, only a limited amount of effort has been conducted to automatically classify percussion instrument in the last years. The studies that deal with percussion sounds are usually restricted to distinguish among the instruments in the drum kit such as toms vs. snare drum vs. bass drum vs. cymbals. In this paper, we are interested in a more challenging task of discriminating sounds produced by the same percussion instrument. Specifically, sounds from different drums cymbals types. Cymbals are known to have indefinite pitch, nonlinear and chaotic behavior. We also identify how the sound of a specific cymbal was produced (e.g., roll or choke movements performed by a drummer). We achieve an accuracy of 96.59% for cymbal type classification and 91.54% in a classification problem with 12 classes which represent the cymbal type and the manner or region that the cymbals are struck. Both results were obtained with Support Vector Machine algorithm using the Line Spectral Frequencies as audio descriptor. We believe that our results can be useful for a more detailed automatic drum transcription and for other related applications as well for audio professionals.Fundação de Amparo a Pesquisa e Desenvolvimento do Estado de São Paulo (FAPESP) (grants 2011/17698-5

    The drum kit and the studio : a spectral and dynamic analysis of the relevant components

    Get PDF
    The research emerged from the need to understand how engineers perceive and record drum kits in modern popular music. We performed a preliminary, exploratory analysis of behavioural aspects in drum kit samples. We searched for similarities and differences, hoping to achieve further understanding of the sonic relationship the instrument shares with others, as well as its involvement in music making. Methodologically, this study adopts a pragmatic analysis of audio contents, extraction of values and comparison of results. We used two methods to analyse the data. The first, a generalised approach, was an individual analysis of each sample in the chosen eight classes (composed of common elements in modern drum kits). The second focused on a single sample that resulted from the down-mix of the previous classes’ sample pools. For the analysis, we handpicked several subjective and objective features as well as a series of low-level audio descriptors that hold information regarding the dynamic and frequency contents of the audio samples. We then conducted a series of processes, which included visual analysis of three-dimensional graphics and software-based information computing, to retrieve the analytical data. Results showed that there are some significant similarities among the classes’ audio features. This led to the assumption that the a priori experience of engineers could, in fact, be a collective and subconscious notion, instinctively achieved in a recording session. In fact, with more research concerning this subject, one may even find new a new way to deal with drum kits in a studio context, hastening time-consuming processes and strenuous tasks that are common when doing so.A investigação científica realizada no ramo do áudio e da música tornou-se abastada e prolífica, exibindo estudos com alto teor informativo para melhor compreensão das diferentes áreas de incidência. Muita da pesquisa desenvolvida foca-se em aspectos pragmáticos: reconhecimento de voz e de padrão, recuperação de informação musical, sistemas de mistura inteligente, entre outros. No entanto, embora estes sejam aspectos formais de elevada importância, tem-se notado uma latente falta de documentação relativa a aspectos mais idílicos e artísticos. O instrumento musical de estudo que escolhemos foi a bateria. Para além de uma vontade pessoal de entender a plenitude das suas características sónicas intrínsecas para aplicações prácticas com resultados tangíveis, é de notar a ausência de discurso e pesquisa científica que por este caminho se tenha aventurado. Não obstante, a bateria tem sido objecto de estudo profundo em contextos analíticos, motivo pelo qual foi também relevante originar a nossa abordagem seminal. Por um lado, as questões físicas de construção e manutenção de baterias, bem como aspectos de índole ambiental e de espaço (salas de gravação) são dos aspectos que mais efeitos produzem na diferença timbríca em múltiplos exemplos de gravações de baterias. No entanto, questões tonais (fundamentais para uma pluralidade de instrumentos) na bateria carecem de estudo e documentação num contexto mundial generalizado. São muitos os engenheiros de som e músicos que alimentam a ideia preconcebida da dificuldade inerente em relacionar este elemento percursivo com os restantes instrumentos numa música. Aliam-se a isto questões subjectivas de gosto e preferência, bem como outros métodos que facilitam a inserção de um instrumento rítmico e semi-harmónico (porque é possível escolher uma afinação para diferentes elementos de uma bateria) numa textura sonora que remete para diferentes conceitos musicais. Portanto, a questão nuclear que este estudo se foca é: “será possível atingir um som idílico nos diferentes elementos de uma bateria?”. Em si só, a ambiguidade desta resposta pode remeter para um conceito dogmático e inflexível, bem como para a ideia de que, até ao momento, nenhuma gravação ou som de bateria alcançou um patamar de extrema qualidade, sonoridade ou ubiquidade que a responda a esta premissa. Partimos, então, desta interrogação e procedemos a uma análise pragmática de amostras sonoras que fossem o mais assimiláveis possível a um contexto comercial. Reunimos amostras de oito classes pré-definidas: bombos, tarolas, pratos de choque, timbalões graves, médios e agudos, crashs e rides. As amostras derivaram de bibliotecas que foram reunidas posteriormente à realização de uma pesquisa em busca dos fabricantes mais conceituados, com maior adesão pública e com antecedentes comerciais tangíveis. Daqui recuperamos 481 amostras. Depois de reunidas, as amostras sofreram um processo de identificação e catalogação, passando também por alguns momentos de processamento de sinal (conversão para ficheiros monofónicos, igualização da duração e normalização do pico de sinal). Em seguida, através do software de computação matemática MATLAB, desenvolvemos linhas de código que foram instrumentais para fase da análise de características e descritores de ficheiros áudio. Finalmente, procedemos a uma reunião dos resultados obtidos e a iniciação de suposições que pudessem originar os valores extraídos. De entre os resultados obtidos, surgiram ideias que, com mais investigação, podem facilitar a compreensão do comportamento sonoro dos diferentes elementos, bem como a criação de métodos de conjugação harmónica entre eles. É importante referir que, neste estudo, partimos de um conceito qualitativo do som, e como tal, omitimos aspectos físicos que, na sua essência, influenciam substancialmente o som que é emitido. No entanto, este trabalho introdutório pretende retificar de forma preliminar esta falta de conceitos subjectivos com evidências palpáveis. Evidências essas que ainda necessitam de investigação adicional para a sua confirmação

    Automatic cymbal classification

    Get PDF
    Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia InformáticaMost of the research on automatic music transcription is focused on the transcription of pitched instruments, like the guitar and the piano. Little attention has been given to unpitched instruments, such as the drum kit, which is a collection of unpitched instruments. Yet, over the last few years this type of instrument started to garner more attention, perhaps due to increasing popularity of the drum kit in the western music. There has been work on automatic music transcription of the drum kit, especially the snare drum, bass drum, and hi-hat. Still, much work has to be done in order to achieve automatic music transcription of all unpitched instruments. An example of a type of unpitched instrument that has very particular acoustic characteristics and that has deserved almost no attention by the research community is the drum kit cymbals. A drum kit contains several cymbals and usually these are treated as a single instrument or are totally disregarded by automatic music classificators of unpitched instruments. We propose to fill this gap and as such, the goal of this dissertation is automatic music classification of drum kit cymbal events, and the identification of which class of cymbals they belong to. As stated, the majority of work developed on this area is mostly done with very different percussive instruments, like the snare drum, bass drum, and hi-hat. On the other hand, cymbals are very similar between them. Their geometry, type of alloys, spectral and sound traits shows us just that. Thus, the great achievement of this work is not only being able to correctly classify the different cymbals, but to be able to identify such similar instruments, which makes this task even harder

    An review of automatic drum transcription

    Get PDF
    In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription(ADT).This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Nonnegative Matrix Factorization and Recurrent Neural Networks. We explain the methods’ technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this fiel

    Ontology of music performance variation

    Get PDF
    Performance variation in rhythm determines the extent that humans perceive and feel the effect of rhythmic pulsation and music in general. In many cases, these rhythmic variations can be linked to percussive performance. Such percussive performance variations are often absent in current percussive rhythmic models. The purpose of this thesis is to present an interactive computer model, called the PD-103, that simulates the micro-variations in human percussive performance. This thesis makes three main contributions to existing knowledge: firstly, by formalising a new method for modelling percussive performance; secondly, by developing a new compositional software tool called the PD-103 that models human percussive performance, and finally, by creating a portfolio of different musical styles to demonstrate the capabilities of the software. A large database of recorded samples are classified into zones based upon the vibrational characteristics of the instruments, to model timbral variation in human percussive performance. The degree of timbral variation is governed by principles of biomechanics and human percussive performance. A fuzzy logic algorithm is applied to analyse current and first-order sample selection in order to formulate an ontological description of music performance variation. Asynchrony values were extracted from recorded performances of three different performance skill levels to create \timing fingerprints" which characterise unique features to each percussionist. The PD-103 uses real performance timing data to determine asynchrony values for each synthesised note. The spectral content of the sample database forms a three-dimensional loudness/timbre space, intersecting instrumental behaviour with music composition. The reparameterisation of the sample database, following the analysis of loudness, spectral flatness, and spectral centroid, provides an opportunity to explore the timbral variations inherent in percussion instruments, to creatively explore dimensions of timbre. The PD-103 was used to create a music portfolio exploring different rhythmic possibilities with a focus on meso-periodic rhythms common to parts of West Africa, jazz drumming, and electroacoustic music. The portfolio also includes new timbral percussive works based on spectral features and demonstrates the central aim of this thesis, which is the creation of a new compositional software tool that integrates human percussive performance and subsequently extends this model to different genres of music

    Automatic classification of laparos call and playback tests at cuniculture nests

    Get PDF
    The vocal behavior of rabbit pups was monitored during their first 15 days of life. It was possible to estimate the average of vocalizations issued in the nest by correlation calculations applied to spectrographic images. We performed experimental tests of playback and observed the behavior between the offspring and the doe during the period of lactation. The vocalizations can be important in pup recognition and consequently, stimulate the doe to nurse their offspring, decreasing the rate of mortality in the breeding phase

    Deep Learning Methods for Instrument Separation and Recognition

    Get PDF
    This thesis explores deep learning methods for timbral information processing in polyphonic music analysis. It encompasses two primary tasks: Music Source Separation (MSS) and Instrument Recognition, with focus on applying domain knowledge and utilising dense arrangements of skip-connections in the frameworks in order to reduce the number of trainable parameters and create more efficient models. Musically-motivated Convolutional Neural Network (CNN) architectures are introduced, emphasizing kernels with vertical, square, and horizontal shapes. This design choice allows for the extraction of essential harmonic and percussive features, which enhances the discrimination of different instruments. Notably, this methodology proves valuable for Harmonic-Percussive Source Separation (HPSS) and instrument recognition tasks. A significant challenge in MSS is generalising to new instrument types and music styles. To address this, a versatile framework for adversarial unsupervised domain adaptation for source separation is proposed, particularly beneficial when labeled data for specific instruments is unavailable. The curation of the Tap & Fiddle dataset is another contribution of the research, offering mixed and isolated stem recordings of traditional Scandinavian fiddle tunes, along with foot-tapping accompaniments, fostering research in source separation and metrical expression analysis within these musical styles. Since our perception of timbre is affected in different ways by transient and stationary parts of sound, the research investigates the potential of Transient Stationary-Noise Decomposition (TSND) as a preprocessing step for frame-level recognition. A method that performs TSND of spectrograms and feeds the decomposed spectrograms to a neural classifier is proposed. Furthermore, this thesis introduces a novel deep learning-based approach for pitch streaming, treating the task as a note-level instrument classification. Such an approach is modular, meaning that it can also successfully stream predicted note-events and not only labelled ground truth note-event information to corresponding instruments. Therefore, the proposed pitch streaming method enables third-party multi-pitch estimation algorithms to perform multi-instrument AMT

    Data-Driven Query by Vocal Percussion

    Get PDF
    The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Query by Vocal Percussion (QVP) is a subfield in Music Information Retrieval (MIR) that explores techniques to query percussive sounds using vocal imitations as input, usually plosive consonant sounds. In this way, fully automated QVP systems can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. This project explores the potential usefulness of recent data-driven neural network models in two of the most important tasks in QVP. Algorithms relative to Vocal Percussion Transcription (VPT) detect and classify vocal percussion sound events in a beatbox-like performance so to trigger individual drum samples. Algorithms relative to Drum Sample Retrieval by Vocalisation (DSRV) use input vocal imitations to pick appropriate drum samples from a sound library via timbral similarity. Our experiments with several kinds of data-driven deep neural networks suggest that these achieve better results in both VPT and DSRV compared to traditional data-informed approaches based on heuristic audio features. We also find that these networks, when paired with strong regularisation techniques, can still outperform data-informed approaches when data is scarce. Finally, we gather several insights relative to people’s approach to vocal percussion and how user-based algorithms are essential to better model individual differences in vocalisation styles
    corecore