10 research outputs found
Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song
In this paper, we introduce a computational analysis of the field recording
dataset of approximately 700 hours of Korean folk songs, which were recorded
around 1980-90s. Because most of the songs were sung by non-expert musicians
without accompaniment, the dataset provides several challenges. To address this
challenge, we utilized self-supervised learning with convolutional neural
network based on pitch contour, then analyzed how the musical concept of tori,
a classification system defined by a specific scale, ornamental notes, and an
idiomatic melodic contour, is captured by the model. The experimental result
shows that our approach can better capture the characteristics of tori compared
to traditional pitch histograms. Using our approaches, we have examined how
musical discussions proposed in existing academia manifest in the actual field
recordings of Korean folk songs.Comment: Accepted at 24th International Society for Music Information
Retrieval Conference (ISMIR 2023
K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling
Lyric translation, a field studied for over a century, is now attracting
computational linguistics researchers. We identified two limitations in
previous studies. Firstly, lyric translation studies have predominantly focused
on Western genres and languages, with no previous study centering on K-pop
despite its popularity. Second, the field of lyric translation suffers from a
lack of publicly available datasets; to the best of our knowledge, no such
dataset exists. To broaden the scope of genres and languages in lyric
translation studies, we introduce a novel singable lyric translation dataset,
approximately 89\% of which consists of K-pop song lyrics. This dataset aligns
Korean and English lyrics line-by-line and section-by-section. We leveraged
this dataset to unveil unique characteristics of K-pop lyric translation,
distinguishing it from other extensively studied genres, and to construct a
neural lyric translation model, thereby underscoring the importance of a
dedicated dataset for singable lyric translations
Predicting performance difficulty from piano sheet music images
Estimating the performance difficulty of a musical score is crucial in music
education for adequately designing the learning curriculum of the students.
Although the Music Information Retrieval community has recently shown interest
in this task, existing approaches mainly use machine-readable scores, leaving
the broader case of sheet music images unaddressed. Based on previous works
involving sheet music images, we use a mid-level representation, bootleg score,
describing notehead positions relative to staff lines coupled with a
transformer model. This architecture is adapted to our task by introducing an
encoding scheme that reduces the encoded sequence length to one-eighth of the
original size. In terms of evaluation, we consider five datasets -- more than
7500 scores with up to 9 difficulty levels -- , two of them particularly
compiled for this work. The results obtained when pretraining the scheme on the
IMSLP corpus and fine-tuning it on the considered datasets prove the proposal's
validity, achieving the best-performing model with a balanced accuracy of
40.34\% and a mean square error of 1.33. Finally, we provide access to our
code, data, and models for transparency and reproducibility
VoiceCoach: Interactive evidence-based training for voice modulation skills in public speaking
The modulation of voice properties, such as pitch, volume, and speed, is
crucial for delivering a successful public speech. However, it is challenging
to master different voice modulation skills. Though many guidelines are
available, they are often not practical enough to be applied in different
public speaking situations, especially for novice speakers. We present
VoiceCoach, an interactive evidence-based approach to facilitate the effective
training of voice modulation skills. Specifically, we have analyzed the voice
modulation skills from 2623 high-quality speeches (i.e., TED Talks) and use
them as the benchmark dataset. Given a voice input, VoiceCoach automatically
recommends good voice modulation examples from the dataset based on the
similarity of both sentence structures and voice modulation skills. Immediate
and quantitative visual feedback is provided to guide further improvement. The
expert interviews and the user study provide support for the effectiveness and
usability of VoiceCoach.Comment: Accepted by CHI '2
A Timbre-based Approach to Estimate Key Velocity from Polyphonic Piano Recordings
Estimating the key velocity of each note from polyphonic piano music is a highly challenging task. Previous work addressed the problem by estimating note intensity using a polyphonic note model. However, they are limited because the note intensity is vulnerable to various factors in a recording environment. In this paper, we propose a novel method to estimate the key velocity focusing on timbre change which is another cue associated with the key velocity. To this end, we separate individual notes of polyphonic piano music using non-negative matrix factorization (NMF) and feed them into a neural network that is trained to discriminate the timbre change according to the key velocity. Combining the note intensity from the separated notes with the statistics of the neural network prediction, the proposed method estimates the key velocity in the dimension of MIDI note velocity. The evaluation on Saarland Music Data and the MAPS dataset shows promising results in terms of robustness to changes in the recording environment
Automatic piano fingering from partially annotated scores using autoregressive neural networks
Comunicació presentada a: 30th ACM International Conference on Multimedia (MM'22), celebrat del 10 al 14 d'octubre de 2022 a Lisboa, PortugalPiano fingering is a creative and highly individualised task acquired by musicians progressively in their first music education
years. Pianists must learn to choose the order of fingers to play
the piano keys because scores do not have engraved finger and
hand movements as other technique elements. Numerous research
efforts have been conducted for automatic piano fingering based on
a previous dataset composed of 150 score excerpts fully annotated
by multiple expert annotators. However, most piano sheets include
partial annotations for problematic finger and hand movements.
We introduce a novel dataset for the task, the ThumbSet dataset,
containing 2523 pieces with partial and noisy annotations of piano
fingering crowdsourced from non-expert annotators. As part of our
methodology, we propose two autoregressive neural networks with
beam search decoding for modelling automatic piano fingering as
a sequence-to-sequence learning problem, considering the correlation between output finger labels. We design the first model with
the exact pitch representation of previous proposals. The second
model uses graph neural networks to more effectively represent
polyphony, whose treatment has been a common issue across previous studies. Finally, we finetune the models on the existing expert
annotations dataset. The evaluation shows that (1) we are able to
achieve high performance when training on the ThumbSet dataset
and that (2) the proposed models outperform the state-of-the-art
hidden Markov models and recurrent neural network baselines.
Code, dataset, models, and results are made available to enhance
the task reproducibility, including a new framework for evaluationThis work is supported in part by the project Musical AI - PID2019-
111403GB-I00/AEI/10.13039/501100011033 funded by the Spanish
Ministerio de Ciencia, Innovacion y Universidades (MCIU) and
the Agencia Estatal de Investigacion (AEI), Sogang University Research Grant of 202110035.01 and JSPS KAKENHI Nos. 21K12187
and 22H03661
Predicting performance difficulty from piano sheet music images
This work has been accepted at the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), at Milan, Italy. October 5-9, 2023.Estimating the performance difficulty of a musical score
is crucial in music education for adequately designing the
learning curriculum of the students. Although the Music
Information Retrieval community has recently shown interest
in this task, existing approaches mainly use machinereadable
scores, leaving the broader case of sheet music
images unaddressed. Based on previous works involving
sheet music images, we use a mid-level representation,
bootleg score, describing notehead positions relative
to staff lines coupled with a transformer model. This architecture
is adapted to our task by introducing an encoding
scheme that reduces the encoded sequence length to oneeighth
of the original size. In terms of evaluation, we consider
five datasets—more than 7500 scores with up to 9 difficulty
levels—, two of them particularly compiled for this
work. The results obtained when pretraining the scheme
on the IMSLP corpus and fine-tuning it on the considered
datasets prove the proposal’s validity, achieving the bestperforming
model with a balanced accuracy of 40.34% and
a mean square error of 1.33. Finally, we provide access
to our code, data, and models for transparency and reproducibility.This work is funded by the Spanish Ministerio de Ciencia, Innovación y Universidades (MCIU) and the Agencia Estatal de Investigación (AEI) within the Musical AI Project – PID2019-111403GBI00/AEI/10.13039/501100011033 and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korea Government (MSIT) (NRF-2022R1F1A1074566)