Search CORE

240 research outputs found

Singing voice correction using canonical time warping

Author: Chen Ming-Tso
Chi Tai-Shih
Luo Yin-Jyun
Su Li
Publication venue
Publication date: 23/11/2017
Field of study

Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario

arXiv.org e-Print Archive

Crossref

Recommended from our members

Joint singing voice separation and F0 estimation with deep U-net architectures

Author: andreas
chao-ling
colin
emmanuel
geoffrey
jean-louis
jong
justin
matthias
olaf
rachel
rachel
scott
sheng
sungheon
tak-shing
tuomas
yipeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2019
Field of study

Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases

City Research Online

Crossref

Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation

Author: Chen Xi
Li Wei
Yu Yi
Zhang Xulong
Publication venue
Publication date: 09/04/2020
Field of study

Singing voice detection is the task to identify the frames which contain the singer vocal or not. It has been one of the main components in music information retrieval (MIR), which can be applicable to melody extraction, artist recognition, and music discovery in popular music. Although there are several methods which have been proposed, a more robust and more complete system is desired to improve the detection performance. In this paper, our motivation is to provide an extensive comparison in different stages of singing voice detection. Based on the analysis a novel method was proposed to build a more efficiently singing voice detection system. In the proposed system, there are main three parts. The first is a pre-process of singing voice separation to extract the vocal without the music. The improvements of several singing voice separation methods were compared to decide the best one which is integrated to singing voice detection system. And the second is a deep neural network based classifier to identify the given frames. Different deep models for classification were also compared. The last one is a post-process to filter out the anomaly frame on the prediction result of the classifier. The median filter and Hidden Markov Model (HMM) based filter as the post process were compared. Through the step by step module extension, the different methods were compared and analyzed. Finally, classification performance on two public datasets indicates that the proposed approach which based on the Long-term Recurrent Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page

arXiv.org e-Print Archive

調波音打楽器音分離による歌声のスペクトルゆらぎに基づく音楽信号処理の研究

Author: TACHIBANA Hideyuki
橘秀幸
Publication venue: 情報理工学系研究科システム情報学専攻
Publication date: 24/03/2014
Field of study

学位の種別:課程博士University of Tokyo(東京大学

Harmony Analysis in A’Capella Singing

Author: Oliver Jarred
Publication venue
Publication date: 01/10/2019
Field of study

Speech production is made by the larynx and then modified by the articulators; this speech contains large amounts of useful information. Similar to speech, singing is made by the same method; albeit with a specific acoustic difference; singing contains rhythm and is usually of a higher intensity. Singing is almost always accompanied by musical instruments which generally makes detecting and separating voice difficult (Kim Hm 2012). A’ Capella singing is known for singing without musical accompaniment, making it somewhat easier to retrieve vocal information. The methods developed to detect information from speech are not new concepts and are commonly applied to almost every item in the average household. Singing processing adapts a large portion of these techniques to detect vocal information of singers including melody, language, emotion, harmony and pitch. The techniques used in speech and singing processing are catagorised into one of three categories: 1. Time Domain 2. Frequency Domain 3. Other Algorithms This project will utilise an algorithm from each category; In particular, Average Magnitude Difference Function (AMDF), Cepstral Analysis and Linear Predictive Coding (LPC). AMDF is the result of taking the absolute value of a sample taken a time (k) and a delayed version of itself at (k-n). Its known to provide relatively good accuracy with low computational cost, however it is prone to variation in background noise (Hui, L et al 2006). Cepstral Analysis is known for separating the convolved version of a signal into the source and voice tract components and provides fast computational speeds from utilising the ii Fourier Transform and its Inverse. LPC provides a linear estimation of past values of a signal, the resulting predictor and error coefficients are utilised to develop the spectral envelope for pitch detection. The project tested the algorithms against 11 tracks containing different harmonic content, each method was compared on their speed, accuracy, where applicable the number of notes correctly identified. All three algorithms gave relatively good results against single note tracks, with the LPC algorithms providing the most accurate results. When tested against multi-note tracks and pre-recorder singing tracks the AMDF and Cepstral Analysis methods performed poorly in terms of the accuracy and number of correctly identified notes. LPC method performed considerably better returning an average of 66.8% of notes correctly

University of Southern Queensland ePrints

Monaural Audio Separation Using Spectral Template and Isolated Note Information

Author: Anil Lal
Wenwu Wang
Publication venue: 'IntechOpen'
Publication date: 10/10/2012
Field of study

IntechOpen

Principled methods for mixtures processing

Author: Liutkus Antoine
Publication venue: HAL CCSD
Publication date: 11/02/2022
Field of study

This document is my thesis for getting the habilitation à diriger des recherches, which is the french diploma that is required to fully supervise Ph.D. students. It summarizes the research I did in the last 15 years and also provides the shortterm research directions and applications I want to investigate. Regarding my past research, I first describe the work I did on probabilistic audio modeling, including the separation of Gaussian and αstable stochastic processes. Then, I mention my work on deep learning applied to audio, which rapidly turned into a large effort for community service. Finally, I present my contributions in machine learning, with some works on hardware compressed sensing and probabilistic generative models.My research programme involves a theoretical part that revolves around probabilistic machine learning, and an applied part that concerns the processing of time series arising in both audio and life sciences

INRIA a CCSD electronic archive server