Search CORE

776 research outputs found

Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips

Author: Nakamura Eita
Nakamura Tomohiko
Sagayama Shigeki
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/12/2015
Field of study

This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score following is particularly useful in automatic accompaniment for practices and rehearsals, where errors and repeats/skips are often made. Simple extensions of the algorithms previously proposed in the literature are not applicable in these situations for scores of practical length due to the problem of large computational complexity. To cope with this problem, we present two hidden Markov models of monophonic performance with errors and arbitrary repeats/skips, and derive efficient score-following algorithms with an assumption that the prior probability distributions of score positions before and after repeats/skips are independent from each other. We confirmed real-time operation of the algorithms with music scores of practical length (around 10000 notes) on a modern laptop and their tracking ability to the input performance within 0.7 s on average after repeats/skips in clarinet performance data. Further improvements and extension for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processin

arXiv.org e-Print Archive

Singing voice correction using canonical time warping

Author: Chen Ming-Tso
Chi Tai-Shih
Luo Yin-Jyun
Su Li
Publication venue
Publication date: 23/11/2017
Field of study

Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario

arXiv.org e-Print Archive

Crossref

Performance Following: Real-Time Prediction of Musical Sequences Without a Score

Author: Plumbley MD
Stark AM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

(c)2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

Crossref

Queen Mary Research Online

Surrey Research Insight

Recommended from our members

Deep neural networks with voice entry estimation heuristics for voice separation in symbolic music representations

Author: de Valk R.
Weyde T.
Publication venue
Publication date
Field of study

In this study we explore the use of deep feedforward neural networks for voice separation in symbolic music representations. We experiment with different network architectures, varying the number and size of the hidden layers, and with dropout. We integrate two voice entry estimation heuristics that estimate the entry points of the individual voices in the polyphonic fabric into the models. These heuristics serve to reduce error propagation at the beginning of a piece, which, as we have shown in previous work, can seriously hamper model performance. The models are evaluated on the 48 fugues from Johann Sebastian Bach’s The Well-Tempered Clavier and his 30 inventions—a dataset that we curated and make publicly available. We find that a model with two hidden layers yields the best results. Using more layers does not lead to a significant performance improvement. Furthermore, we find that our voice entry estimation heuristics are highly effective in the reduction of error propagation, improving performance significantly. Our best-performing model outperforms our previous models, where the difference is significant, and, depending on the evaluation metric, performs close to or better than the reported state of the art

City Research Online

PERFORMANCE FOLLOWING: TRACKING A PERFORMANCE WITHOUT A SCORE

Author: Plumbley MD
Stark AM
Publication venue
Publication date: 01/01/2010
Field of study

EPSRC Doctoral Training Award; EPSRC Leadership Fellowshi

CiteSeerX

Crossref

University of Surrey

Queen Mary Research Online

Surrey Research Insight

Rhythm extraction from polyphonic symbolic music

Author: Arnaud Guillaume
Gaymay Rémi
Giraud Mathieu
Groult Richard
Levé Florence
Séguin Cyril
Publication venue: HAL CCSD
Publication date: 01/10/2011
Field of study

International audienceWe focus on the rhythmic component of symbolic music similarity, proposing several ways to extract a monophonic rhythmic signature from a symbolic poly- phonic score. To go beyond the simple extraction of all time intervals between onsets (noteson extraction), we select notes according to their length (short and long extractions) or their intensities (intensity+/− extractions). Once the rhythm is extracted, we use dynamic programming to compare several sequences. We report results of analysis on the size of rhythm patterns that are specific to a unique piece, as well as experiments on similarity queries (ragtime music and Bach chorale variations). These results show that long and intensity+ extractions are often good choices for rhythm extraction. Our conclusions are that, even from polyphonic symbolic music, rhythm alone can be enough to identify a piece or to perform pertinent music similarity queries, especially when using wise rhythm extractions

HAL - Lille 3

INRIA a CCSD electronic archive server

Hal-Diderot

MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription

Author: Ahlbäck S
Demirel E
Dixon S
Publication venue
Publication date: 01/01/2021
Field of study

This paper makes several contributions to automatic lyrics transcription (ALT) research. Our main contribution is a novel variant of the Multistreaming Time-Delay Neural Network (MTDNN) architecture, called MSTRE-Net, which processes the temporal information using multiple streams in parallel with varying resolutions keeping the network more compact, and thus with a faster inference and an improved recognition rate than having identical TDNN streams. In addition, two novel preprocessing steps prior to training the acoustic model are proposed. First, we suggest using recordings from both monophonic and polyphonic domains during training the acoustic model. Second, we tag monophonic and polyphonic recordings with distinct labels for discriminating non-vocal silence and music instances during alignment. Moreover, we present a new test set with a considerably larger size and a higher musical variability compared to the existing datasets used in ALT literature, while maintaining the gender balance of the singers. Our best performing model sets the state-of-the-art in lyrics transcription by a large margin. For reproducibility, we publicly share the identifiers to retrieve the data used in this paper

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Queen Mary Research Online

The effect of using pitch and duration for symbolic music retrieval

Author: Suyoto I
Uitdenbogerd A
Publication venue: RMIT University (Melbourne, Australia)
Publication date: 01/01/2008
Field of study

Quite reasonable retrieval effectiveness is achieved for retrieving polyphonic (multiple notes at once) music that is symbolically encoded via melody queries, using relatively simple pattern matching techniques based on pitch sequences. Earlier work showed that adding duration information was not particularly helpful for improving retrieval effectiveness. In this paper we demonstrate that defining the duration information as the time interval between consecutive notes does lead to more effective retrieval when combined with pitch-based pattern matching in our collection of over 14 000 MIDI files

RMIT Research Repository