5,688 research outputs found
Adaptive Scattering Transforms for Playing Technique Recognition
Playing techniques contain distinctive information about musical expressivity and interpretation. Yet, current research in music signal analysis suffers from a scarcity of computational models for playing techniques, especially in the context of live performance. To address this problem, our paper develops a general framework for playing technique recognition. We propose the adaptive scattering transform, which refers to any scattering transform that includes a stage of data-driven dimensionality reduction over at least one of its wavelet variables, for representing playing techniques. Two adaptive scattering features are presented: frequency-adaptive scattering and direction-adaptive scattering. We analyse seven playing techniques: vibrato, tremolo, trill, flutter-tongue, acciaccatura, portamento, and glissando. To evaluate the proposed methodology, we create a new dataset containing full-length Chinese bamboo flute performances (CBFdataset) with expert playing technique annotations. Once trained on the proposed scattering representations, a support vector classifier achieves state-of-the-art results. We provide explanatory visualisations of scattering coefficients for each technique and verify the system over three additional datasets with various instrumental and vocal techniques: VPset, SOL, and VocalSet
Visually Indicated Sounds
Objects make distinctive sounds when they are hit or scratched. These sounds
reveal aspects of an object's material properties, as well as the actions that
produced them. In this paper, we propose the task of predicting what sound an
object makes when struck as a way of studying physical interactions within a
visual scene. We present an algorithm that synthesizes sound from silent videos
of people hitting and scratching objects with a drumstick. This algorithm uses
a recurrent neural network to predict sound features from videos and then
produces a waveform from these features with an example-based synthesis
procedure. We show that the sounds predicted by our model are realistic enough
to fool participants in a "real or fake" psychophysical experiment, and that
they convey significant information about material properties and physical
interactions
Scattering Transform for Playing Technique Recognition
Playing techniques are expressive elements in music performances that
carry important information about music expressivity and interpretation.
When displaying playing techniques in the time–frequency domain, we
observe that each has a distinctive spectro-temporal pattern. Based on
the patterns of regularity, we group commonly-used playing techniques
into two families: pitch modulation-based techniques (PMTs) and pitch
evolution-based techniques (PETs). The former are periodic modulations
that elaborate on stable pitches, including vibrato, tremolo, trill, and
flutter-tongue; while the latter contain monotonic pitch changes, such
as acciaccatura, portamento, and glissando.
In this thesis, we present a general framework based on the scattering transform for playing technique recognition. We propose two
variants of the scattering transform, the adaptive scattering and the
direction-invariant joint scattering. The former provides highly-compact
representations that are invariant to pitch transpositions for representing PMTs. The latter captures the spectro-temporal patterns exhibited
by PETs. Using the proposed scattering representations as input, our
recognition system achieves start-of-the-art results. We provide a formal
interpretation of the role of each scattering component confirmed by
explanatory visualisations.
Whereas previously published datasets for playing technique analysis
focused primarily on techniques recorded in isolation, we publicly release
a new dataset to evaluate the proposed framework. The dataset, named
CBFdataset, is the first dataset on the Chinese bamboo flute (CBF),
containing full-length CBF performances and expert annotations of
playing techniques. To provide evidence on the generalisability of the
proposed framework, we test it over three additional datasets with a
variety of playing techniques. Finally, to explore the applicability of
the proposed scattering representations to general audio classification
problems, we introduce two additional applications: one applies the
adaptive scattering for identifying performers in polyphonic orchestral
music and the other uses the joint scattering for detecting and classifying
chick calls
- …