266 research outputs found
Music Visualization Using Source Separated Stereophonic Music
This thesis introduces a music visualization system for stereophonic source separated music. Music visualization systems are a popular way to represent information from audio signals through computer graphics. Visualization can help people better understand music and its complex and interacting elements. This music visualization system extracts pitch, panning, and loudness features from source separated audio files to create the visual. Most state-of-the art visualization systems develop their visual representation of the music from either the fully mixed final song recording, where all of the instruments and vocals are combined into one file, or from the digital audio workstation (DAW) data containing multiple independent recordings of individual audio sources. Original source recordings are not always readily available to the public so music source separation (MSS) can be used to obtain estimated versions of the audio source files. This thesis surveys different approaches to MSS and music visualization as well as introduces a new music visualization system specifically for source separated music
Recommended from our members
Joint singing voice separation and F0 estimation with deep U-net architectures
Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases
Music Source Separation with Band-split RNN
The performance of music source separation (MSS) models has been greatly
improved in recent years thanks to the development of novel neural network
architectures and training pipelines. However, recent model designs for MSS
were mainly motivated by other audio processing tasks or other research fields,
while the intrinsic characteristics and patterns of the music signals were not
fully discovered. In this paper, we propose band-split RNN (BSRNN), a
frequency-domain model that explictly splits the spectrogram of the mixture
into subbands and perform interleaved band-level and sequence-level modeling.
The choices of the bandwidths of the subbands can be determined by a priori
knowledge or expert knowledge on the characteristics of the target source in
order to optimize the performance on a certain type of target musical
instrument. To better make use of unlabeled data, we also describe a
semi-supervised model finetuning pipeline that can further improve the
performance of the model. Experiment results show that BSRNN trained only on
MUSDB18-HQ dataset significantly outperforms several top-ranking models in
Music Demixing (MDX) Challenge 2021, and the semi-supervised finetuning stage
further improves the performance on all four instrument tracks
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation
Monaural Singing Voice Separation (MSVS) is a challenging task and has been
studied for decades. Deep neural networks (DNNs) are the current
state-of-the-art methods for MSVS. However, the existing DNNs are often
designed manually, which is time-consuming and error-prone. In addition, the
network architectures are usually pre-defined, and not adapted to the training
data. To address these issues, we introduce a Neural Architecture Search (NAS)
method to the structure design of DNNs for MSVS. Specifically, we propose a new
multi-resolution Convolutional Neural Network (CNN) framework for MSVS namely
Multi-Resolution Pooling CNN (MRP-CNN), which uses various-size pooling
operators to extract multi-resolution features. Based on the NAS, we then
develop an evolving framework namely Evolving MRP-CNN (E-MRP-CNN), by
automatically searching the effective MRP-CNN structures using genetic
algorithms, optimized in terms of a single-objective considering only
separation performance, or multi-objective considering both the separation
performance and the model complexity. The multi-objective E-MRP-CNN gives a set
of Pareto-optimal solutions, each providing a trade-off between separation
performance and model complexity. Quantitative and qualitative evaluations on
the MIR-1K and DSD100 datasets are used to demonstrate the advantages of the
proposed framework over several recent baselines
- …