7 research outputs found
A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance
The goal of real-time lyrics alignment is to take live singing audio as input
and to pinpoint the exact position within given lyrics on the fly. The task can
benefit real-world applications such as the automatic subtitling of live
concerts or operas. However, designing a real-time model poses a great
challenge due to the constraints of only using past input and operating within
a minimal latency. Furthermore, due to the lack of datasets for real-time
models for lyrics alignment, previous studies have mostly evaluated with
private in-house datasets, resulting in a lack of standard evaluation methods.
This paper presents a real-time lyrics alignment system for classical vocal
performances with two contributions. First, we improve the lyrics alignment
algorithm by finding an optimal combination of chromagram and phonetic
posteriorgram (PPG) that capture melodic and phonetics features of the singing
voice, respectively. Second, we recast the Schubert Winterreise Dataset (SWD)
which contains multiple performance renditions of the same pieces as an
evaluation set for the real-time lyrics alignment.Comment: To Appear IEEE ICASSP 202
A Human-Computer Duet System for Music Performance
Virtual musicians have become a remarkable phenomenon in the contemporary
multimedia arts. However, most of the virtual musicians nowadays have not been
endowed with abilities to create their own behaviors, or to perform music with
human musicians. In this paper, we firstly create a virtual violinist, who can
collaborate with a human pianist to perform chamber music automatically without
any intervention. The system incorporates the techniques from various fields,
including real-time music tracking, pose estimation, and body movement
generation. In our system, the virtual musician's behavior is generated based
on the given music audio alone, and such a system results in a low-cost,
efficient and scalable way to produce human and virtual musicians'
co-performance. The proposed system has been validated in public concerts.
Objective quality assessment approaches and possible ways to systematically
improve the system are also discussed
Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips
This paper discusses real-time alignment of audio signals of music
performance to the corresponding score (a.k.a. score following) which can
handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips)
in performances. This type of score following is particularly useful in
automatic accompaniment for practices and rehearsals, where errors and
repeats/skips are often made. Simple extensions of the algorithms previously
proposed in the literature are not applicable in these situations for scores of
practical length due to the problem of large computational complexity. To cope
with this problem, we present two hidden Markov models of monophonic
performance with errors and arbitrary repeats/skips, and derive efficient
score-following algorithms with an assumption that the prior probability
distributions of score positions before and after repeats/skips are independent
from each other. We confirmed real-time operation of the algorithms with music
scores of practical length (around 10000 notes) on a modern laptop and their
tracking ability to the input performance within 0.7 s on average after
repeats/skips in clarinet performance data. Further improvements and extension
for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on
Audio, Speech, and Language Processin
PIANO SCORE FOLLOWING WITH HIDDEN TIMBRE OR TEMPO USING SWITCHING KALMAN FILTERS
Thesis (Ph.D.) - Indiana University, University Graduate School/Luddy School of Informatics, Computing, and Engineering, 2020Score following is an AI technique that enables computer programs to “listen to” music: to track a live musical performance in relation to its written score, even through variations in tempo and amplitude. This ability can be transformative for musical practice, performance, education, and composition. Although score following has been successful on monophonic music (one note at a time), it has difficulty with polyphonic music. One of the greatest challenges is piano music, which is highly polyphonic. This dissertation investigates ways to overcome the challenges of polyphonic music, and casts light on the nature of the problem through empirical experiments. I propose two new approaches inspired by two important aspects of music that humans perceive during a performance: the pitch profile of the sound, and the timing. In the first approach, I account for changing timbre within a chord by tracking harmonic amplitudes to improve matching between the score and the sound. In the second approach, I model tempo in music, allowing it to deviate from the default tempo value within reasonable statistical constraints. For both methods, I develop switching Kalman filter models that are interesting in their own right. I have conducted experiments on 50 excerpts of real piano performances, and analyzed the results both case-by-case and statistically.
The results indicate that modeling tempo is essential for piano score following, and the second method significantly outperformed the state-of-the-art baseline. The first method, although it did not show improvement over the baseline, still represents a promising new direction for future research. Taken together, the results contribute to a more nuanced and multifaceted understanding of the score-following problem
Determining the effect of human cognitive biases in social robots for human-robotm interactions
The research presented in this thesis describes a model for aiding human-robot interactions based on
the principle of showing behaviours which are created based on 'human' cognitive biases by a robot in
human-robot interactions. The aim of this work is to study how cognitive biases can affect human-robot
interactions in the long term.
Currently, most human-robot interactions are based on a set of well-ordered and structured
rules, which repeat regardless of the person or social situation. This trend tends to provide an unrealistic
interaction, which can make difficult for humans to relate ‘naturally’ with the social robot after a number
of relations. The main focus of these interactions is that the social robot shows a very structured set of
behaviours and, as such, acts unnaturally and mechanical in terms of social interactions. On the other
hand, fallible behaviours (e.g. forgetfulness, inability to understand other’ emotions, bragging, blaming
others) are common behaviours in humans and can be seen in regular social interactions. Some of these
fallible behaviours are caused by the various cognitive biases. Researchers studied and developed
various humanlike skills (e.g. personality, emotions expressions, traits) in social robots to make their
behaviours more humanlike, and as a result, social robots can perform various humanlike actions, such
as walking, talking, gazing or emotional expression. But common human behaviours such as
forgetfulness, inability to understand other emotions, bragging or blaming are not present in the current
social robots; such behaviours which exist and influence people have not been explored in social robots.
The study presented in this thesis developed five cognitive biases in three different robots in
four separate experiments to understand the influences of such cognitive biases in human–robot
interactions. The results show that participants initially liked to interact with the robot with cognitive
biased behaviours more than the robot without such behaviours. In my first two experiments, the robots
(e.g., ERWIN, MyKeepon) interacted with the participants using a single bias (i.e., misattribution and
empathy gap) cognitive biases accordingly, and participants enjoyed the interactions using such bias
effects: for example, forgetfulness, source confusions, always showing exaggerated happiness or
sadness and so on in the robots. In my later experiments, participants interacted with the robot (e.g.,
MARC) three times, with a time interval between two interactions, and results show that the likeness
the interactions where the robot shows biased behaviours decreases less than the interactions where the
robot did not show any biased behaviours.
In the current thesis, I describe the investigations of these traits of forgetfulness, the inability
to understand others’ emotions, and bragging and blaming behaviours, which are influenced by
cognitive biases, and I also analyse people’s responses to robots displaying such biased behaviours in
human–robot interactions
doi:10.1155/2011/384651 Research Article Real-Time Audio-to-Score Alignment Using Particle Filter for Coplayer Music Robots
Copyright © 2011 Takuma Otsuka et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Our goal is to develop a coplayer music robot capable of presenting a musical expression together with humans. Although many instrument-performing robots exist, they may have difficulty playing with human performers due to the lack of the synchronization function. The robot has to follow differences in humans ’ performance such as temporal fluctuations to play with human performers. We classify synchronization and musical expression into two levels: (1) melody level and (2) rhythm level to cope with erroneous synchronizations. The idea is as follows: When the synchronization with the melody is reliable, respond to the pitch the robot hears, when the synchronization is uncertain, try to follow the rhythm of the music. Our method estimates the score position for the melody level and the tempo for the rhythm level. The reliability of the score position estimation is extracted from the probability distribution of the score position. The experimental results demonstrate that our method outperforms the existing score following system in 16 songs out of 20 polyphonic songs. The error in the prediction of the score position is reduced by 69% on average. The results also revealed that the switching mechanism alleviates the error in the estimation of the score position. 1