59,178 research outputs found
A Sequential MUSIC algorithm for Scatterers Detection 2 in SAR Tomography Enhanced by a Robust Covariance 3 Estimator
Synthetic aperture radar (SAR) tomography (TomoSAR) is an appealing tool for
the extraction of height information of urban infrastructures. Due to the
widespread applications of the MUSIC algorithm in source localization, it is a
suitable solution in TomoSAR when multiple snapshots (looks) are available.
While the classical MUSIC algorithm aims to estimate the whole reflectivity
profile of scatterers, sequential MUSIC algorithms are suited for the detection
of sparse point-like scatterers. In this class of methods, successive
cancellation is performed through orthogonal complement projections on the
MUSIC power spectrum. In this work, a new sequential MUSIC algorithm named
recursive covariance canceled MUSIC (RCC-MUSIC), is proposed. This method
brings higher accuracy in comparison with the previous sequential methods at
the cost of a negligible increase in computational cost. Furthermore, to
improve the performance of RCC-MUSIC, it is combined with the recent method of
covariance matrix estimation called correlation subspace. Utilizing the
correlation subspace method results in a denoised covariance matrix which in
turn, increases the accuracy of subspace-based methods. Several numerical
examples are presented to compare the performance of the proposed method with
the relevant state-of-the-art methods. As a subspace method, simulation results
demonstrate the efficiency of the proposed method in terms of estimation
accuracy and computational load
SCNet: Sparse Compression Network for Music Source Separation
Deep learning-based methods have made significant achievements in music
source separation. However, obtaining good results while maintaining a low
model complexity remains challenging in super wide-band music source
separation. Previous works either overlook the differences in subbands or
inadequately address the problem of information loss when generating subband
features. In this paper, we propose SCNet, a novel frequency-domain network to
explicitly split the spectrogram of the mixture into several subbands and
introduce a sparsity-based encoder to model different frequency bands. We use a
higher compression ratio on subbands with less information to improve the
information density and focus on modeling subbands with more information. In
this way, the separation performance can be significantly improved using lower
computational consumption. Experiment results show that the proposed model
achieves a signal to distortion ratio (SDR) of 9.0 dB on the MUSDB18-HQ dataset
without using extra data, which outperforms state-of-the-art methods.
Specifically, SCNet's CPU inference time is only 48% of HT Demucs, one of the
previous state-of-the-art models.Comment: Accepted by ICASSP 202
Learning a feature space for similarity in world music
In this study we investigate computational methods for assessing music similarity in world music styles. We use state-of-the-art audio features to describe musical content in world music recordings. Our music collection is a subset of the Smithsonian Folkways Recordings with audio examples from 31 countries from around the world. Using supervised and unsupervised dimensionality reduction techniques we learn feature representations for music similarity. We evaluate how well music styles separate in this learned space with a classification experiment. We obtained moderate performance classifying the recordings by country. Analysis of misclassifications revealed cases of geographical or cultural proximity. We further evaluate the learned space by detecting outliers, i.e. identifying recordings that stand out in the collection. We use a data mining technique based on Mahalanobis distances to detect outliers and perform a listening experiment in the ‘odd one out’ style to evaluate our findings. We are able to detect, amongst others, recordings of non-musical content as outliers as well as music with distinct timbral and harmonic content. The listening experiment reveals moderate agreement between subjects’ ratings and our outlier estimation
A collaborative filtering method for music recommendation
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsThe present dissertation focuses on proposing and describing a collaborative filtering approach for
Music Recommender Systems. Music Recommender Systems, which are part of a broader class of
Recommender Systems, refer to the task of automatically filtering data to predict the songs that are
more likely to match a particular profile.
So far, academic researchers have proposed a variety of machine learning approaches for determining
which tracks to recommend to users. The most sophisticated among them consist, often, on complex
learning techniques which can also require considerable computational resources. However, recent
research studies proved that more simplistic approaches based on nearest neighbors could lead to
good results, often at much lower computational costs, representing a viable alternative solution to
the Music Recommender System problem.
Throughout this thesis, we conduct offline experiments on a freely-available collection of listening
histories from real users, each one containing several different music tracks. We extract a subset of 10
000 songs to assess the performance of the proposed system, comparing it with a Popularity-based
model approach. Furthermore, we provide a conceptual overview of the recommendation problem,
describing the state-of-the-art methods, and presenting its current challenges. Finally, the last section
is dedicated to summarizing the essential conclusions and presenting possible future improvements
An efficient temporally-constrained probabilistic model for multiple-instrument music transcription
In this paper, an efficient, general-purpose model for multiple instrument polyphonic music transcription is proposed. The model is based on probabilistic latent component analysis and supports the use of sound state spectral templates, which represent the temporal evolution of each note (e.g. attack, sustain, decay). As input, a variable-Q transform (VQT) time-frequency representation is used. Computational efficiency is achieved by supporting the use of pre-extracted and pre-shifted sound state templates. Two variants are presented: without temporal constraints and with hidden Markov model-based constraints controlling the appearance of sound states. Experiments are performed on benchmark transcription datasets: MAPS, TRIOS, MIREX multiF0, and Bach10; results on multi-pitch detection and instrument assignment show that the proposed models outperform the state-of-the-art for multiple-instrument transcription and is more than 20 times faster compared to a previous sound state-based model. We finally show that a VQT representation can lead to improved multi-pitch detection performance compared with constant-Q representations
Learnable Front Ends Based on Temporal Modulation for Music Tagging
While end-to-end systems are becoming popular in auditory signal processing
including automatic music tagging, models using raw audio as input needs a
large amount of data and computational resources without domain knowledge.
Inspired by the fact that temporal modulation is regarded as an essential
component in auditory perception, we introduce the Temporal Modulation Neural
Network (TMNN) that combines Mel-like data-driven front ends and temporal
modulation filters with a simple ResNet back end. The structure includes a set
of temporal modulation filters to capture long-term patterns in all frequency
channels. Experimental results show that the proposed front ends surpass
state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music
tagging, and they are also helpful for keyword spotting on speech commands.
Moreover, the model performance for each tag suggests that genre or instrument
tags with complex rhythm and mood tags can especially be improved with temporal
modulation.Comment: Submitted to ICASSP 202
BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-Resolution
Generally, Deep Neural Networks (DNNs) are expected to have high performance
when their model size is large. However, large models failed to produce
high-quality results commensurate with their scale in music Super-Resolution
(SR). We attribute this to that DNNs cannot learn information commensurate with
their size from standard mean square error losses. To unleash the potential of
large DNN models in music SR, we propose BigWavGAN, which incorporates Demucs,
a large-scale wave-to-wave model, with State-Of-The-Art (SOTA) discriminators
and adversarial training strategies. Our discriminator consists of Multi-Scale
Discriminator (MSD) and Multi-Resolution Discriminator (MRD). During inference,
since only the generator is utilized, there are no additional parameters or
computational resources required compared to the baseline model Demucs.
Objective evaluation affirms the effectiveness of BigWavGAN in music SR.
Subjective evaluations indicate that BigWavGAN can generate music with
significantly high perceptual quality over the baseline model. Notably,
BigWavGAN surpasses the SOTA music SR model in both simulated and real-world
scenarios. Moreover, BigWavGAN represents its superior generalization ability
to address out-of-distribution data. The conducted ablation study reveals the
importance of our discriminators and training strategies. Samples are available
on the demo page
- …