64 research outputs found
AVA: An Interactive System for Visual and Quantitative Analyses of Vibrato and Portamento Performance Styles
CCOM-HuQin: an Annotated Multimodal Chinese Fiddle Performance Dataset
HuQin is a family of traditional Chinese bowed string instruments. Playing
techniques(PTs) embodied in various playing styles add abundant emotional
coloring and aesthetic feelings to HuQin performance. The complex applied
techniques make HuQin music a challenging source for fundamental MIR tasks such
as pitch analysis, transcription and score-audio alignment. In this paper, we
present a multimodal performance dataset of HuQin music that contains
audio-visual recordings of 11,992 single PT clips and 57 annotated musical
pieces of classical excerpts. We systematically describe the HuQin PT taxonomy
based on musicological theory and practical use cases. Then we introduce the
dataset creation methodology and highlight the annotation principles featuring
PTs. We analyze the statistics in different aspects to demonstrate the variety
of PTs played in HuQin subcategories and perform preliminary experiments to
show the potential applications of the dataset in various MIR tasks and
cross-cultural music studies. Finally, we propose future work to be extended on
the dataset.Comment: 15 pages, 11 figure
Adaptive Scattering Transforms for Playing Technique Recognition
International audiencePlaying techniques contain distinctive information about musical expressivity and interpretation. Yet, current research in music signal analysis suffers from a scarcity of computational models for playing techniques, especially in the context of live performance. To address this problem, our paper develops a general framework for playing technique recognition. We propose the adaptive scattering transform, which refers to any scattering transform that includes a stage of data-driven dimensionality reduction over at least one of its wavelet variables, for representing playing techniques. Two adaptive scattering features are presented: frequency-adaptive scattering and direction-adaptive scattering. We analyse seven playing techniques: vibrato, tremolo, trill, flutter-tongue, acciaccatura, portamento, and glissando. To evaluate the proposed methodology, we create a new dataset containing full-length Chinese bamboo flute performances (CBFdataset) with expert playing technique annotations. Once trained on the proposed scattering representations, a support vector classifier achieves state-of-the-art results. We provide explanatory visualisations of scattering coefficients for each technique and verify the system over three additional datasets with various instrumental and vocal techniques: VPset, SOL, and VocalSet
Error Action Recognition on Playing The Erhu Musical Instrument Using Hybrid Classification Method with 3D-CNN and LSTM
Erhu is a stringed instrument originating from China. In playing this instrument, there are rules on how to position the player's body and hold the instrument correctly. Therefore, a system is needed that can detect every movement of the Erhu player. This study will discuss action recognition on video using the 3DCNN and LSTM methods. The 3D Convolutional Neural Network method is a method that has a CNN base. To improve the ability to capture every information stored in every movement, combining an LSTM layer in the 3D-CNN model is necessary. LSTM is capable of handling the vanishing gradient problem faced by RNN. This research uses RGB video as a dataset, and there are three main parts in preprocessing and feature extraction. The three main parts are the body, erhu pole, and bow. To perform preprocessing and feature extraction, this study uses a body landmark to perform preprocessing and feature extraction on the body segment. In contrast, the erhu and bow segments use the Hough Lines algorithm. Furthermore, for the classification process, we propose two algorithms, namely, traditional algorithm and deep learning algorithm. These two-classification algorithms will produce an error message output from every movement of the erhu player
Computational Modelling and Analysis of Vibrato and Portamento in Expressive Music Performance
PhD, 148ppVibrato and portamento constitute two expressive devices involving continuous
pitch modulation and is widely employed in string, voice, wind music instrument
performance. Automatic extraction and analysis of such expressive features
form some of the most important aspects of music performance research and
represents an under-explored area in music information retrieval. This thesis
aims to provide computational and scalable solutions for the automatic extraction
and analysis of performed vibratos and portamenti. Applications of the
technologies include music learning, musicological analysis, music information
retrieval (summarisation, similarity assessment), and music expression synthesis.
To automatically detect vibratos and estimate their parameters, we propose
a novel method based on the Filter Diagonalisation Method (FDM). The FDM
remains robust over short time frames, allowing frame sizes to be set at values
small enough to accurately identify local vibrato characteristics and pinpoint
vibrato boundaries. For the determining of vibrato presence, we test two alternate
decision mechanisms—the Decision Tree and Bayes’ Rule. The FDM
systems are compared to state-of-the-art techniques and obtains the best results.
The FDM’s vibrato rate accuracies are above 92.5%, and the vibrato
extent accuracies are about 85%.
We use the Hidden Markov Model (HMM) with Gaussian Mixture Model
(GMM) to detect portamento existence. Upon extracting the portamenti, we
propose a Logistic Model for describing portamento parameters. The Logistic
Model has the lowest root mean squared error and the highest adjusted Rsquared
value comparing to regression models employing Polynomial and Gaussian
functions, and the Fourier Series.
The vibrato and portamento detection and analysis methods are implemented
in AVA, an interactive tool for automated detection, analysis, and visualisation
of vibrato and portamento. Using the system, we perform crosscultural
analyses of vibrato and portamento differences between erhu and violin
performance styles, and between typical male or female roles in Beijing opera
singing
Extended playing techniques: The next milestone in musical instrument recognition
The expressive variability in producing a musical note conveys information
essential to the modeling of orchestration and style. As such, it plays a
crucial role in computer-assisted browsing of massive digital music corpora.
Yet, although the automatic recognition of a musical instrument from the
recording of a single "ordinary" note is considered a solved problem, automatic
identification of instrumental playing technique (IPT) remains largely
underdeveloped. We benchmark machine listening systems for query-by-example
browsing among 143 extended IPTs for 16 instruments, amounting to 469 triplets
of instrument, mute, and technique. We identify and discuss three necessary
conditions for significantly outperforming the traditional mel-frequency
cepstral coefficient (MFCC) baseline: the addition of second-order scattering
coefficients to account for amplitude modulation, the incorporation of
long-range temporal dependencies, and metric learning using large-margin
nearest neighbors (LMNN) to reduce intra-class variability. Evaluating on the
Studio On Line (SOL) dataset, we obtain a precision at rank 5 of 99.7% for
instrument recognition (baseline at 89.0%) and of 61.0% for IPT recognition
(baseline at 44.5%). We interpret this gain through a qualitative assessment of
practical usability and visualization using nonlinear dimensionality reduction.Comment: 10 pages, 9 figures. The source code to reproduce the experiments of
this paper is made available at:
https://www.github.com/mathieulagrange/dlfm201
Towards a novel approach for real-time psycho-physiological and emotional response measurement: findings from a small-scale empirical study on sad erhu music
The aim of the present study is to introduce a novel, systematic approach for real-time psycho-physiological and emotional response measurement. As a vital part of the development of this approach, a small-scale study of four participants was con- ducted to collect listeners’ real-time psycho-physiological and emotional responses to sad erhu music.
In this empirical study, four then university students (2 Chinese, 2 non- Chinese; 3 females, 1 male) were asked to continuously report their inducted musical emotions during listening trials, while their real-time psycho-physiological responses were recorded simultaneously with Continuous Response Measurement Apparatus (CReMA, Himonides & Welch, 2005).
Participants’ continuous emotional and psycho-physiological responses were recorded and partially processed in Labchart (ADInstruments Pty Ltd, 2010). Other data processing and further data analyses of processed data were performed using the Python programming language. A framework for data processing and analysis, which is replicable and easily extensible, was then constructed to investigate: 1) the impact of music on emotional and psycho-physiological responses; 2) the cor- relation between psycho-physiological and emotional responses; 3) the differences between real-time and post-listening emotional responses; 4) the impact of culture, gender, age, personality and experience of music education on psycho-physiological and emotional responses
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
The objective of this paper is to recover the original component signals from
a mixture audio with the aid of visual cues of the sound sources. Such task is
usually referred as visually guided sound source separation. The proposed
Cascaded Opponent Filter (COF) framework consists of multiple stages, which
recursively refine the source separation. A key element in COF is a novel
opponent filter module that identifies and relocates residual components
between sources. The system is guided by the appearance and motion of the
source, and, for this purpose, we study different representations based on
video frames, optical flows, dynamic images, and their combinations. Finally,
we propose a Sound Source Location Masking (SSLM) technique, which, together
with COF, produces a pixel level mask of the source location. The entire system
is trained end-to-end using a large set of unlabelled videos. We compare COF
with recent baselines and obtain the state-of-the-art performance in three
challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). Project page:
https://ly-zhu.github.io/cof-net.Comment: main paper 14 pages, ref 3 pages, and supp 7 pages. Revised argument
in section 3 and
- …
