124 research outputs found
Recommended from our members
Joint singing voice separation and F0 estimation with deep U-net architectures
Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases
RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music
Vocal pitch is an important high-level feature in music audio processing.
However, extracting vocal pitch in polyphonic music is more challenging due to
the presence of accompaniment. To eliminate the influence of the accompaniment,
most previous methods adopt music source separation models to obtain clean
vocals from polyphonic music before predicting vocal pitches. As a result, the
performance of vocal pitch estimation is affected by the music source
separation models. To address this issue and directly extract vocal pitches
from polyphonic music, we propose a robust model named RMVPE. This model can
extract effective hidden features and accurately predict vocal pitches from
polyphonic music. The experimental results demonstrate the superiority of RMVPE
in terms of raw pitch accuracy (RPA) and raw chroma accuracy (RCA).
Additionally, experiments conducted with different types of noise show that
RMVPE is robust across all signal-to-noise ratio (SNR) levels. The code of
RMVPE is available at https://github.com/Dream-High/RMVPE.Comment: This paper has been accepted by INTERSPEECH 202
Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks
Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, originally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present a full end-to-end pipeline for audio music classification that includes a sequential pattern mining technique and a contour simplification method to extract relevant motifs from audio recordings. Mel-spectrograms of the extracted motifs are then used as the input for the different architectures tested. We investigate the usefulness of motivic patterns for the automatic classification of music recordings and the effect of the length of the audio and corpus size on the overall classification accuracy. Results show a relative accuracy improvement of up to 20.4% when CNN architectures are trained using acoustic representations from motivic patterns
- …