Search CORE

266 research outputs found

Music Visualization Using Source Separated Stereophonic Music

Author: Chookaszian Hannah Eileen
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2022
Field of study

This thesis introduces a music visualization system for stereophonic source separated music. Music visualization systems are a popular way to represent information from audio signals through computer graphics. Visualization can help people better understand music and its complex and interacting elements. This music visualization system extracts pitch, panning, and loudness features from source separated audio files to create the visual. Most state-of-the art visualization systems develop their visual representation of the music from either the fully mixed final song recording, where all of the instruments and vocals are combined into one file, or from the digital audio workstation (DAW) data containing multiple independent recordings of individual audio sources. Original source recordings are not always readily available to the public so music source separation (MSS) can be used to obtain estimated versions of the audio source files. This thesis surveys different approaches to MSS and music visualization as well as introduces a new music visualization system specifically for source separated music

DigitalCommons@CalPoly

Recommended from our members

Joint singing voice separation and F0 estimation with deep U-net architectures

Author: andreas
chao-ling
colin
emmanuel
geoffrey
jean-louis
jong
justin
matthias
olaf
rachel
rachel
scott
sheng
sungheon
tak-shing
tuomas
yipeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2019
Field of study

Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases

City Research Online

Crossref

Music Source Separation with Band-split RNN

Author: Luo Yi
Yu Jianwei
Publication venue
Publication date: 29/09/2022
Field of study

The performance of music source separation (MSS) models has been greatly improved in recent years thanks to the development of novel neural network architectures and training pipelines. However, recent model designs for MSS were mainly motivated by other audio processing tasks or other research fields, while the intrinsic characteristics and patterns of the music signals were not fully discovered. In this paper, we propose band-split RNN (BSRNN), a frequency-domain model that explictly splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling. The choices of the bandwidths of the subbands can be determined by a priori knowledge or expert knowledge on the characteristics of the target source in order to optimize the performance on a certain type of target musical instrument. To better make use of unlabeled data, we also describe a semi-supervised model finetuning pipeline that can further improve the performance of the model. Experiment results show that BSRNN trained only on MUSDB18-HQ dataset significantly outperforms several top-ranking models in Music Demixing (MDX) Challenge 2021, and the semi-supervised finetuning stage further improves the performance on all four instrument tracks

arXiv.org e-Print Archive

Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

Author: Dong Bofei
Unoki Masashi
Wang Shengbei
Wang Wenwu
Yuan Weitao
Publication venue
Publication date: 03/08/2020
Field of study

Monaural Singing Voice Separation (MSVS) is a challenging task and has been studied for decades. Deep neural networks (DNNs) are the current state-of-the-art methods for MSVS. However, the existing DNNs are often designed manually, which is time-consuming and error-prone. In addition, the network architectures are usually pre-defined, and not adapted to the training data. To address these issues, we introduce a Neural Architecture Search (NAS) method to the structure design of DNNs for MSVS. Specifically, we propose a new multi-resolution Convolutional Neural Network (CNN) framework for MSVS namely Multi-Resolution Pooling CNN (MRP-CNN), which uses various-size pooling operators to extract multi-resolution features. Based on the NAS, we then develop an evolving framework namely Evolving MRP-CNN (E-MRP-CNN), by automatically searching the effective MRP-CNN structures using genetic algorithms, optimized in terms of a single-objective considering only separation performance, or multi-objective considering both the separation performance and the model complexity. The multi-objective E-MRP-CNN gives a set of Pareto-optimal solutions, each providing a trade-off between separation performance and model complexity. Quantitative and qualitative evaluations on the MIR-1K and DSD100 datasets are used to demonstrate the advantages of the proposed framework over several recent baselines

arXiv.org e-Print Archive

University of Surrey