240 research outputs found
Singing voice correction using canonical time warping
Expressive singing voice correction is an appealing but challenging problem.
A robust time-warping algorithm which synchronizes two singing recordings can
provide a promising solution. We thereby propose to address the problem by
canonical time warping (CTW) which aligns amateur singing recordings to
professional ones. A new pitch contour is generated given the alignment
information, and a pitch-corrected singing is synthesized back through the
vocoder. The objective evaluation shows that CTW is robust against
pitch-shifting and time-stretching effects, and the subjective test
demonstrates that CTW prevails the other methods including DTW and the
commercial auto-tuning software. Finally, we demonstrate the applicability of
the proposed method in a practical, real-world scenario
Recommended from our members
Joint singing voice separation and F0 estimation with deep U-net architectures
Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases
Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation
Singing voice detection is the task to identify the frames which contain the
singer vocal or not. It has been one of the main components in music
information retrieval (MIR), which can be applicable to melody extraction,
artist recognition, and music discovery in popular music. Although there are
several methods which have been proposed, a more robust and more complete
system is desired to improve the detection performance. In this paper, our
motivation is to provide an extensive comparison in different stages of singing
voice detection. Based on the analysis a novel method was proposed to build a
more efficiently singing voice detection system. In the proposed system, there
are main three parts. The first is a pre-process of singing voice separation to
extract the vocal without the music. The improvements of several singing voice
separation methods were compared to decide the best one which is integrated to
singing voice detection system. And the second is a deep neural network based
classifier to identify the given frames. Different deep models for
classification were also compared. The last one is a post-process to filter out
the anomaly frame on the prediction result of the classifier. The median filter
and Hidden Markov Model (HMM) based filter as the post process were compared.
Through the step by step module extension, the different methods were compared
and analyzed. Finally, classification performance on two public datasets
indicates that the proposed approach which based on the Long-term Recurrent
Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page
調波音打楽器音分離による歌声のスペクトルゆらぎに基づく音楽信号処理の研究
学位の種別:課程博士University of Tokyo(東京大学
Harmony Analysis in A’Capella Singing
Speech production is made by the larynx and then modified by the articulators; this speech contains large amounts of useful information. Similar to speech, singing is made by the same method; albeit with a specific acoustic difference; singing contains rhythm and is usually of a higher intensity. Singing is almost always accompanied by musical instruments which generally makes detecting and separating voice difficult (Kim Hm 2012). A’ Capella singing is known for singing without musical accompaniment, making it somewhat easier to retrieve vocal information.
The methods developed to detect information from speech are not new concepts and are commonly applied to almost every item in the average household. Singing processing adapts a large portion of these techniques to detect vocal information of singers including melody, language, emotion, harmony and pitch. The techniques used in speech and singing processing are catagorised into one of three categories:
1. Time Domain
2. Frequency Domain
3. Other Algorithms
This project will utilise an algorithm from each category; In particular, Average Magnitude Difference Function (AMDF), Cepstral Analysis and Linear Predictive Coding (LPC). AMDF is the result of taking the absolute value of a sample taken a time (k) and a delayed version of itself at (k-n). Its known to provide relatively good accuracy with low computational cost, however it is prone to variation in background noise (Hui, L et al 2006).
Cepstral Analysis is known for separating the convolved version of a signal into the source and voice tract components and provides fast computational speeds from utilising the ii Fourier Transform and its Inverse. LPC provides a linear estimation of past values of a signal, the resulting predictor and error coefficients are utilised to develop the spectral envelope for pitch detection.
The project tested the algorithms against 11 tracks containing different harmonic content, each method was compared on their speed, accuracy, where applicable the number of notes correctly identified. All three algorithms gave relatively good results against single note tracks, with the LPC algorithms providing the most accurate results. When tested against multi-note tracks and pre-recorder singing tracks the AMDF and Cepstral Analysis methods performed poorly in terms of the accuracy and number of correctly identified notes. LPC method performed considerably better returning an average of 66.8% of notes correctly
Principled methods for mixtures processing
This document is my thesis for getting the habilitation à diriger des recherches, which is the french diploma that is required to fully supervise Ph.D. students. It summarizes the research I did in the last 15 years and also provides the shortterm research directions and applications I want to investigate. Regarding my past research, I first describe the work I did on probabilistic audio modeling, including the separation of Gaussian and αstable stochastic processes. Then, I mention my work on deep learning applied to audio, which rapidly turned into a large effort for community service. Finally, I present my contributions in machine learning, with some works on hardware compressed sensing and probabilistic generative models.My research programme involves a theoretical part that revolves around probabilistic machine learning, and an applied part that concerns the processing of time series arising in both audio and life sciences
- …