64 research outputs found

    CCOM-HuQin: an Annotated Multimodal Chinese Fiddle Performance Dataset

    Full text link
    HuQin is a family of traditional Chinese bowed string instruments. Playing techniques(PTs) embodied in various playing styles add abundant emotional coloring and aesthetic feelings to HuQin performance. The complex applied techniques make HuQin music a challenging source for fundamental MIR tasks such as pitch analysis, transcription and score-audio alignment. In this paper, we present a multimodal performance dataset of HuQin music that contains audio-visual recordings of 11,992 single PT clips and 57 annotated musical pieces of classical excerpts. We systematically describe the HuQin PT taxonomy based on musicological theory and practical use cases. Then we introduce the dataset creation methodology and highlight the annotation principles featuring PTs. We analyze the statistics in different aspects to demonstrate the variety of PTs played in HuQin subcategories and perform preliminary experiments to show the potential applications of the dataset in various MIR tasks and cross-cultural music studies. Finally, we propose future work to be extended on the dataset.Comment: 15 pages, 11 figure

    Adaptive Scattering Transforms for Playing Technique Recognition

    Get PDF
    International audiencePlaying techniques contain distinctive information about musical expressivity and interpretation. Yet, current research in music signal analysis suffers from a scarcity of computational models for playing techniques, especially in the context of live performance. To address this problem, our paper develops a general framework for playing technique recognition. We propose the adaptive scattering transform, which refers to any scattering transform that includes a stage of data-driven dimensionality reduction over at least one of its wavelet variables, for representing playing techniques. Two adaptive scattering features are presented: frequency-adaptive scattering and direction-adaptive scattering. We analyse seven playing techniques: vibrato, tremolo, trill, flutter-tongue, acciaccatura, portamento, and glissando. To evaluate the proposed methodology, we create a new dataset containing full-length Chinese bamboo flute performances (CBFdataset) with expert playing technique annotations. Once trained on the proposed scattering representations, a support vector classifier achieves state-of-the-art results. We provide explanatory visualisations of scattering coefficients for each technique and verify the system over three additional datasets with various instrumental and vocal techniques: VPset, SOL, and VocalSet

    Error Action Recognition on Playing The Erhu Musical Instrument Using Hybrid Classification Method with 3D-CNN and LSTM

    Get PDF
    Erhu is a stringed instrument originating from China. In playing this instrument, there are rules on how to position the player's body and hold the instrument correctly. Therefore, a system is needed that can detect every movement of the Erhu player. This study will discuss action recognition on video using the 3DCNN and LSTM methods. The 3D Convolutional Neural Network method is a method that has a CNN base. To improve the ability to capture every information stored in every movement, combining an LSTM layer in the 3D-CNN model is necessary. LSTM is capable of handling the vanishing gradient problem faced by RNN. This research uses RGB video as a dataset, and there are three main parts in preprocessing and feature extraction. The three main parts are the body, erhu pole, and bow. To perform preprocessing and feature extraction, this study uses a body landmark to perform preprocessing and feature extraction on the body segment. In contrast, the erhu and bow segments use the Hough Lines algorithm. Furthermore, for the classification process, we propose two algorithms, namely, traditional algorithm and deep learning algorithm. These two-classification algorithms will produce an error message output from every movement of the erhu player

    Computational Modelling and Analysis of Vibrato and Portamento in Expressive Music Performance

    Get PDF
    PhD, 148ppVibrato and portamento constitute two expressive devices involving continuous pitch modulation and is widely employed in string, voice, wind music instrument performance. Automatic extraction and analysis of such expressive features form some of the most important aspects of music performance research and represents an under-explored area in music information retrieval. This thesis aims to provide computational and scalable solutions for the automatic extraction and analysis of performed vibratos and portamenti. Applications of the technologies include music learning, musicological analysis, music information retrieval (summarisation, similarity assessment), and music expression synthesis. To automatically detect vibratos and estimate their parameters, we propose a novel method based on the Filter Diagonalisation Method (FDM). The FDM remains robust over short time frames, allowing frame sizes to be set at values small enough to accurately identify local vibrato characteristics and pinpoint vibrato boundaries. For the determining of vibrato presence, we test two alternate decision mechanisms—the Decision Tree and Bayes’ Rule. The FDM systems are compared to state-of-the-art techniques and obtains the best results. The FDM’s vibrato rate accuracies are above 92.5%, and the vibrato extent accuracies are about 85%. We use the Hidden Markov Model (HMM) with Gaussian Mixture Model (GMM) to detect portamento existence. Upon extracting the portamenti, we propose a Logistic Model for describing portamento parameters. The Logistic Model has the lowest root mean squared error and the highest adjusted Rsquared value comparing to regression models employing Polynomial and Gaussian functions, and the Fourier Series. The vibrato and portamento detection and analysis methods are implemented in AVA, an interactive tool for automated detection, analysis, and visualisation of vibrato and portamento. Using the system, we perform crosscultural analyses of vibrato and portamento differences between erhu and violin performance styles, and between typical male or female roles in Beijing opera singing

    Extended playing techniques: The next milestone in musical instrument recognition

    Full text link
    The expressive variability in producing a musical note conveys information essential to the modeling of orchestration and style. As such, it plays a crucial role in computer-assisted browsing of massive digital music corpora. Yet, although the automatic recognition of a musical instrument from the recording of a single "ordinary" note is considered a solved problem, automatic identification of instrumental playing technique (IPT) remains largely underdeveloped. We benchmark machine listening systems for query-by-example browsing among 143 extended IPTs for 16 instruments, amounting to 469 triplets of instrument, mute, and technique. We identify and discuss three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability. Evaluating on the Studio On Line (SOL) dataset, we obtain a precision at rank 5 of 99.7% for instrument recognition (baseline at 89.0%) and of 61.0% for IPT recognition (baseline at 44.5%). We interpret this gain through a qualitative assessment of practical usability and visualization using nonlinear dimensionality reduction.Comment: 10 pages, 9 figures. The source code to reproduce the experiments of this paper is made available at: https://www.github.com/mathieulagrange/dlfm201

    Towards a novel approach for real-time psycho-physiological and emotional response measurement: findings from a small-scale empirical study on sad erhu music

    Get PDF
    The aim of the present study is to introduce a novel, systematic approach for real-time psycho-physiological and emotional response measurement. As a vital part of the development of this approach, a small-scale study of four participants was con- ducted to collect listeners’ real-time psycho-physiological and emotional responses to sad erhu music. In this empirical study, four then university students (2 Chinese, 2 non- Chinese; 3 females, 1 male) were asked to continuously report their inducted musical emotions during listening trials, while their real-time psycho-physiological responses were recorded simultaneously with Continuous Response Measurement Apparatus (CReMA, Himonides & Welch, 2005). Participants’ continuous emotional and psycho-physiological responses were recorded and partially processed in Labchart (ADInstruments Pty Ltd, 2010). Other data processing and further data analyses of processed data were performed using the Python programming language. A framework for data processing and analysis, which is replicable and easily extensible, was then constructed to investigate: 1) the impact of music on emotional and psycho-physiological responses; 2) the cor- relation between psycho-physiological and emotional responses; 3) the differences between real-time and post-listening emotional responses; 4) the impact of culture, gender, age, personality and experience of music education on psycho-physiological and emotional responses

    Visually Guided Sound Source Separation using Cascaded Opponent Filter Network

    Get PDF
    The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained end-to-end using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). Project page: https://ly-zhu.github.io/cof-net.Comment: main paper 14 pages, ref 3 pages, and supp 7 pages. Revised argument in section 3 and
    corecore