799 research outputs found

    A New Dataset for Amateur Vocal Percussion Analysis

    Full text link
    The imitation of percussive instruments via the human voice is a natural way for us to communicate rhythmic ideas and, for this reason, it attracts the interest of music makers. Specifically, the automatic mapping of these vocal imitations to their emulated instruments would allow creators to realistically prototype rhythms in a faster way. The contribution of this study is two-fold. Firstly, a new Amateur Vocal Percussion (AVP) dataset is introduced to investigate how people with little or no experience in beatboxing approach the task of vocal percussion. The end-goal of this analysis is that of helping mapping algorithms to better generalise between subjects and achieve higher performances. The dataset comprises a total of 9780 utterances recorded by 28 participants with fully annotated onsets and labels (kick drum, snare drum, closed hi-hat and opened hi-hat). Lastly, we conducted baseline experiments on audio onset detection with the recorded dataset, comparing the performance of four state-of-the-art algorithms in a vocal percussion context

    Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification

    Get PDF
    Vocal Percussion Transcription (VPT) is concerned with the automatic detection and classification of vocal percussion sound events, allowing music creators and producers to sketch drum lines on the fly. Classifier algorithms in VPT systems learn best from small user-specific datasets, which usually restrict modelling to small input feature sets to avoid data overfitting. This study explores several deep supervised learning strategies to obtain informative feature sets for amateur vocal percussion classification. We evaluated the performance of these sets on regular vocal percussion classification tasks and compared them with several baseline approaches including feature selection methods and a speech recognition engine. These proposed learning models were supervised with several label sets containing information from four different levels of abstraction: instrument-level, syllable-level, phoneme-level, and boxeme-level. Results suggest that convolutional neural networks supervised with syllable-level annotations produced the most informative embeddings for classification, which can be used as input representations to fit classifiers with. Finally, we used back-propagation-based saliency maps to investigate the importance of different spectrogram regions for feature learning

    Delayed Decision-making in Real-time Beatbox Percussion Classification

    Get PDF
    This is an electronic version of an article published in Journal of New Music Research, 39(3), 203-213, 2010. doi:10.1080/09298215.2010.512979. Journal of New Music Research is available online at: www.tandfonline.com/openurl?genre=article&issn=1744-5027&volume=39&issue=3&spage=20

    Data-Driven Query by Vocal Percussion

    Get PDF
    The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Query by Vocal Percussion (QVP) is a subfield in Music Information Retrieval (MIR) that explores techniques to query percussive sounds using vocal imitations as input, usually plosive consonant sounds. In this way, fully automated QVP systems can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. This project explores the potential usefulness of recent data-driven neural network models in two of the most important tasks in QVP. Algorithms relative to Vocal Percussion Transcription (VPT) detect and classify vocal percussion sound events in a beatbox-like performance so to trigger individual drum samples. Algorithms relative to Drum Sample Retrieval by Vocalisation (DSRV) use input vocal imitations to pick appropriate drum samples from a sound library via timbral similarity. Our experiments with several kinds of data-driven deep neural networks suggest that these achieve better results in both VPT and DSRV compared to traditional data-informed approaches based on heuristic audio features. We also find that these networks, when paired with strong regularisation techniques, can still outperform data-informed approaches when data is scarce. Finally, we gather several insights relative to people’s approach to vocal percussion and how user-based algorithms are essential to better model individual differences in vocalisation styles

    Classification of Classical Indian Music Tabla Taals using Deep Learning

    Get PDF
    In the research that we are bringing to light, we profoundly explore the categorization of Classical Indian Music Tabla Taals. This emphasizes widely recognized taals such as Addhatrital, Ektal, Rupak, Dadra, Deepchandi, Jhaptal, Trital, and Bhajani. To push the boundaries of our understanding, we implement a mixed-methods approach tethering both Feedforward Neural Networks (FNN) and Convolutional Neural Networks (CNN). These state-of-the-art technologies enable us to dissect and categorize tabla taals efficiently. In essence, the hallmark of Classical Indian music is its complex and multifaceted rhythms brought to life by the primal percussive instrument - the tabla. The conception and reproduction of these nuanced taals require technical finesse. Thus, accompanying the digital revolution and the eclectic musical preferences, it becomes essential for advanced methodologies to pinpoint and classify tabla taals. The hardcover of our research opens up to the magnificent crafting of an unmatched model employing both FNN and CNN. This blend enables us to recognize diverse features unique to tabla taals like Addhatrital, Ektal, Rupak, Dadra, Deepchandi, Jhaptal, Trital, and Bhajani. The model obtained its bosom knowledge during training from an assortment of Classical Indian music recordings showcasing these invigorating taals. This fosters a broader understanding regarding the array of minute differences brimming within each rhythmic inheritance. To bring user interaction to life, we have embedded a Graphical User Interface (GUI). This empowers users to introduce an audio file filled with table music from the taals listed and receive on-the-spot recognition. refining their connection and knowledge of the taal in question. Our research findings procure paramount importance in the scape of music analysis, especially framed within the heart of Classical Indian Music. We propose a system that would serve as a tool for amateur table players to learn the skill well and master their art. Instructors could also utilize it for training purposes. It opens a new window of possibilities providing an advanced model for intuitive, swift, and accurate automated identification of tabla taals

    Making music through real-time voice timbre analysis: machine learning and timbral control

    Get PDF
    PhDPeople can achieve rich musical expression through vocal sound { see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and ful lling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how to convert that information suitably into control data. In this thesis we address these questions, with a focus on timbral control, and in particular we develop approaches that can be used with a wide variety of musical instruments by applying machine learning techniques to automatically derive the mappings between expressive audio input and control output. The vocal audio signal is construed to include a broad range of expression, in particular encompassing the extended techniques used in human beatboxing. The central contribution of this work is the application of supervised and unsupervised machine learning techniques to automatically map vocal timbre to synthesiser timbre and controls. Component contributions include a delayed decision-making strategy for low-latency sound classi cation, a regression-tree method to learn associations between regions of two unlabelled datasets, a fast estimator of multidimensional di erential entropy and a qualitative method for evaluating musical interfaces based on discourse analysis

    Automatic characterization and generation of music loops and instrument samples for electronic music production

    Get PDF
    Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation.Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation

    Towards Music Structural Segmentation across Genres: Features, Structural Hypotheses, and Annotation Principles

    Get PDF
    This work is supported by China Scholarship Council (CSC) and EPSRC project (EP/L019981/1) Fusing Semantic and Audio Technologies for Intelligent Music Production and Consumption (FAST-IMPACt). Sandler acknowledges the support of the Royal Society as a recipient of a Wolfson Research Merit Award

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections
    corecore