48 research outputs found

    Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification

    Get PDF
    Vocal Percussion Transcription (VPT) is concerned with the automatic detection and classification of vocal percussion sound events, allowing music creators and producers to sketch drum lines on the fly. Classifier algorithms in VPT systems learn best from small user-specific datasets, which usually restrict modelling to small input feature sets to avoid data overfitting. This study explores several deep supervised learning strategies to obtain informative feature sets for amateur vocal percussion classification. We evaluated the performance of these sets on regular vocal percussion classification tasks and compared them with several baseline approaches including feature selection methods and a speech recognition engine. These proposed learning models were supervised with several label sets containing information from four different levels of abstraction: instrument-level, syllable-level, phoneme-level, and boxeme-level. Results suggest that convolutional neural networks supervised with syllable-level annotations produced the most informative embeddings for classification, which can be used as input representations to fit classifiers with. Finally, we used back-propagation-based saliency maps to investigate the importance of different spectrogram regions for feature learning

    A New Dataset for Amateur Vocal Percussion Analysis

    Full text link
    The imitation of percussive instruments via the human voice is a natural way for us to communicate rhythmic ideas and, for this reason, it attracts the interest of music makers. Specifically, the automatic mapping of these vocal imitations to their emulated instruments would allow creators to realistically prototype rhythms in a faster way. The contribution of this study is two-fold. Firstly, a new Amateur Vocal Percussion (AVP) dataset is introduced to investigate how people with little or no experience in beatboxing approach the task of vocal percussion. The end-goal of this analysis is that of helping mapping algorithms to better generalise between subjects and achieve higher performances. The dataset comprises a total of 9780 utterances recorded by 28 participants with fully annotated onsets and labels (kick drum, snare drum, closed hi-hat and opened hi-hat). Lastly, we conducted baseline experiments on audio onset detection with the recorded dataset, comparing the performance of four state-of-the-art algorithms in a vocal percussion context

    BeatBox: End-user Interactive Definition and Training of Recognizers for Percussive Vocalizations

    Get PDF
    Interactive end-user training of machine learning systems has received significant attention as a tool for personalizing recognizers. However, most research limits end users to training a fixed set of application-defined concepts. This paper considers additional challenges that arise in end-user support for defining the number and nature of concepts that a system must learn to recognize. We develop BeatBox, a new system that enables end-user creation of custom beatbox recognizers and interactive adaptation of recognizers to an end user’s technique, environment, and musical goals. BeatBox proposes rapid end-user exploration of variations in the number and nature of learned concepts, and provides end users with feedback on the reliability of recognizers learned for different potential combinations of percussive vocalizations. In a preliminary evaluation, we observed that end users were able to quickly create usable classifiers, that they explored different combinations of concepts to test alternative vocalizations and to refine classifiers for new musical contexts, and that learnability feedback was often helpful in alerting them to potential difficulties with a desired learning concept

    Data-Driven Query by Vocal Percussion

    Get PDF
    The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Query by Vocal Percussion (QVP) is a subfield in Music Information Retrieval (MIR) that explores techniques to query percussive sounds using vocal imitations as input, usually plosive consonant sounds. In this way, fully automated QVP systems can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. This project explores the potential usefulness of recent data-driven neural network models in two of the most important tasks in QVP. Algorithms relative to Vocal Percussion Transcription (VPT) detect and classify vocal percussion sound events in a beatbox-like performance so to trigger individual drum samples. Algorithms relative to Drum Sample Retrieval by Vocalisation (DSRV) use input vocal imitations to pick appropriate drum samples from a sound library via timbral similarity. Our experiments with several kinds of data-driven deep neural networks suggest that these achieve better results in both VPT and DSRV compared to traditional data-informed approaches based on heuristic audio features. We also find that these networks, when paired with strong regularisation techniques, can still outperform data-informed approaches when data is scarce. Finally, we gather several insights relative to people’s approach to vocal percussion and how user-based algorithms are essential to better model individual differences in vocalisation styles

    Real-time detection and classification of drum sounds

    Get PDF
    V moji diplomski nalogi sem se ukvarjal z detekcijo osnovnih udarcev bobnov, natančneje vokalnega bobnanja. Razviti je bilo potrebno sistem, ki prepoznava različne udarce bobnov v realnem času. V sistemu sem najprej uporabil metodo detekcije začetkov udarcev (angl. onset detection) v signalu. Za razlikovanje med udarci sem si v veliki meri pomagal s Fourierovo transformacijo, na podlagi katere dobimo frekvenčni spekter signala. Iz signala sem nato izračunal najpomembnejše značilke in na podlagi tega udarce razvrstil v posamezne razrede. Kot rezultat je nastala aplikacija, ki posamezne udarce v vhodnem signalu v realnem času zamenja z vnaprej shranjenimi zvoki bobnov ter jih predvaja. Aplikacijo lahko uporablja vsak, ki ima vsaj malo smisla za proizvajanje udarcev z usti in je lahko v pomoč pri učenju razlik med osnovnimi udarci. V uvodnem poglavju predstavljam temo, na katero se delo nanaša, omenjam tudi že raziskane teme na tem področju in jih na kratko komentiram. Na kratko opišem, kako sem se problema lotil in ga kasneje rešil. V drugem poglavju podajam teoretične osnove, ki omogočajo bralcu brez predznanja vsaj okvirno razumevanje tematike. Najprej povem nekaj splošnih besed o zvoku in o digitalnem snemanju zvoka. Ker se moja tematika nanaša na transkripcijo ritma oziroma bobnov, natančneje vokalnega bobnanja, predstavim v poglavju še nekaj splošnih informacij o bobnih in vokalnemu bobnanju. V nadaljevanju prehajam v bolj tehnične opise, in sicer kaj signal sploh je in kako ga zajeti v digitalni obliki. To je tematika digitalnega procesiranja signalov, ki je glavni predmet obravnave pri računalniški obdelavi podatkov, s katero sem se srečal v začetni fazi razvoja aplikacije. Naslednje, verjetno najpomembnejše teoretično poglavje, se nanaša na Fourierovo transformacijo, ki je eno izmed najpomembnejših matematičnih orodij za analizo linearno časovnih sistemov in predstavlja moderno orodje v današnjih telekomunikacijskih sistemih. Peto poglavje je v celoti namenjeno razlagi detekcije bobnov. Opišem vse potrebne algoritme in metode, ki so potrebne za razvoj aplikacije, kot so filtriranje signalov, detekcija začetkov, pridobivanje značilk ter klasifikacija udarcev bobnov. Sledi pregled metode in izvedbe dela, razlaga uporabljenih orodij, opis pristopa k delu, razlaga načrtovanja aplikacije po korakih, prikaz rezultatov dela ter opis delovanja aplikacije. V sklepu naredim povzetek opravljenega dela, ocenim uspešnost ter zanesljivost aplikacije in na koncu podam možne izboljšave

    Real-time detection and classification of drum sounds

    Get PDF
    V moji diplomski nalogi sem se ukvarjal z detekcijo osnovnih udarcev bobnov, natančneje vokalnega bobnanja. Razviti je bilo potrebno sistem, ki prepoznava različne udarce bobnov v realnem času. V sistemu sem najprej uporabil metodo detekcije začetkov udarcev (angl. onset detection) v signalu. Za razlikovanje med udarci sem si v veliki meri pomagal s Fourierovo transformacijo, na podlagi katere dobimo frekvenčni spekter signala. Iz signala sem nato izračunal najpomembnejše značilke in na podlagi tega udarce razvrstil v posamezne razrede. Kot rezultat je nastala aplikacija, ki posamezne udarce v vhodnem signalu v realnem času zamenja z vnaprej shranjenimi zvoki bobnov ter jih predvaja. Aplikacijo lahko uporablja vsak, ki ima vsaj malo smisla za proizvajanje udarcev z usti in je lahko v pomoč pri učenju razlik med osnovnimi udarci. V uvodnem poglavju predstavljam temo, na katero se delo nanaša, omenjam tudi že raziskane teme na tem področju in jih na kratko komentiram. Na kratko opišem, kako sem se problema lotil in ga kasneje rešil. V drugem poglavju podajam teoretične osnove, ki omogočajo bralcu brez predznanja vsaj okvirno razumevanje tematike. Najprej povem nekaj splošnih besed o zvoku in o digitalnem snemanju zvoka. Ker se moja tematika nanaša na transkripcijo ritma oziroma bobnov, natančneje vokalnega bobnanja, predstavim v poglavju še nekaj splošnih informacij o bobnih in vokalnemu bobnanju. V nadaljevanju prehajam v bolj tehnične opise, in sicer kaj signal sploh je in kako ga zajeti v digitalni obliki. To je tematika digitalnega procesiranja signalov, ki je glavni predmet obravnave pri računalniški obdelavi podatkov, s katero sem se srečal v začetni fazi razvoja aplikacije. Naslednje, verjetno najpomembnejše teoretično poglavje, se nanaša na Fourierovo transformacijo, ki je eno izmed najpomembnejših matematičnih orodij za analizo linearno časovnih sistemov in predstavlja moderno orodje v današnjih telekomunikacijskih sistemih. Peto poglavje je v celoti namenjeno razlagi detekcije bobnov. Opišem vse potrebne algoritme in metode, ki so potrebne za razvoj aplikacije, kot so filtriranje signalov, detekcija začetkov, pridobivanje značilk ter klasifikacija udarcev bobnov. Sledi pregled metode in izvedbe dela, razlaga uporabljenih orodij, opis pristopa k delu, razlaga načrtovanja aplikacije po korakih, prikaz rezultatov dela ter opis delovanja aplikacije. V sklepu naredim povzetek opravljenega dela, ocenim uspešnost ter zanesljivost aplikacije in na koncu podam možne izboljšave

    Making music through real-time voice timbre analysis: machine learning and timbral control

    Get PDF
    PhDPeople can achieve rich musical expression through vocal sound { see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and ful lling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how to convert that information suitably into control data. In this thesis we address these questions, with a focus on timbral control, and in particular we develop approaches that can be used with a wide variety of musical instruments by applying machine learning techniques to automatically derive the mappings between expressive audio input and control output. The vocal audio signal is construed to include a broad range of expression, in particular encompassing the extended techniques used in human beatboxing. The central contribution of this work is the application of supervised and unsupervised machine learning techniques to automatically map vocal timbre to synthesiser timbre and controls. Component contributions include a delayed decision-making strategy for low-latency sound classi cation, a regression-tree method to learn associations between regions of two unlabelled datasets, a fast estimator of multidimensional di erential entropy and a qualitative method for evaluating musical interfaces based on discourse analysis

    Daily Eastern News: November 08, 1991

    Get PDF
    https://thekeep.eiu.edu/den_1991_nov/1004/thumbnail.jp

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    Vocal imitation of percussion sounds: On the perceptual similarity between imitations and imitated sounds

    Get PDF
    Recent studies have demonstrated the effectiveness of the voice for communicating sonic ideas, and the accuracy with which it can be used to imitate acoustic instruments, synthesised sounds and environmental sounds. However, there has been little research on vocal imitation of percussion sounds, particularly concerning the perceptual similarity between imitations and the sounds being imitated. In the present study we address this by investigating how accurately musicians can vocally imitate percussion sounds, in terms of whether listeners consider the imitations 'more similar' to the imitated sounds than to other same-category sounds. In a vocal production task, 14 musicians imitated 30 drum sounds from five categories (cymbals, hats, kicks, snares, toms). Listeners were then asked to rate the similarity between the imitations and same-category drum sounds via web based listening test. We found that imitated sounds received the highest similarity ratings for 16 of the 30 sounds. The similarity between a given drum sound and its imitation was generally rated higher than for imitations of another same-category sound, however for some drum categories (snares and toms) certain sounds were consistently considered most similar to the imitations, irrespective of the sound being imitated. Finally, we apply an existing auditory image based measure for perceptual similarity between same-category drum sounds, to model the similarity ratings using linear mixed effect regression. The results indicate that this measure is a good predictor of perceptual similarity between imitations and imitated sounds, when compared to acoustic features containing only temporal or spectral features
    corecore