59,178 research outputs found

    A Sequential MUSIC algorithm for Scatterers Detection 2 in SAR Tomography Enhanced by a Robust Covariance 3 Estimator

    Full text link
    Synthetic aperture radar (SAR) tomography (TomoSAR) is an appealing tool for the extraction of height information of urban infrastructures. Due to the widespread applications of the MUSIC algorithm in source localization, it is a suitable solution in TomoSAR when multiple snapshots (looks) are available. While the classical MUSIC algorithm aims to estimate the whole reflectivity profile of scatterers, sequential MUSIC algorithms are suited for the detection of sparse point-like scatterers. In this class of methods, successive cancellation is performed through orthogonal complement projections on the MUSIC power spectrum. In this work, a new sequential MUSIC algorithm named recursive covariance canceled MUSIC (RCC-MUSIC), is proposed. This method brings higher accuracy in comparison with the previous sequential methods at the cost of a negligible increase in computational cost. Furthermore, to improve the performance of RCC-MUSIC, it is combined with the recent method of covariance matrix estimation called correlation subspace. Utilizing the correlation subspace method results in a denoised covariance matrix which in turn, increases the accuracy of subspace-based methods. Several numerical examples are presented to compare the performance of the proposed method with the relevant state-of-the-art methods. As a subspace method, simulation results demonstrate the efficiency of the proposed method in terms of estimation accuracy and computational load

    SCNet: Sparse Compression Network for Music Source Separation

    Full text link
    Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propose SCNet, a novel frequency-domain network to explicitly split the spectrogram of the mixture into several subbands and introduce a sparsity-based encoder to model different frequency bands. We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information. In this way, the separation performance can be significantly improved using lower computational consumption. Experiment results show that the proposed model achieves a signal to distortion ratio (SDR) of 9.0 dB on the MUSDB18-HQ dataset without using extra data, which outperforms state-of-the-art methods. Specifically, SCNet's CPU inference time is only 48% of HT Demucs, one of the previous state-of-the-art models.Comment: Accepted by ICASSP 202

    Learning a feature space for similarity in world music

    Get PDF
    In this study we investigate computational methods for assessing music similarity in world music styles. We use state-of-the-art audio features to describe musical content in world music recordings. Our music collection is a subset of the Smithsonian Folkways Recordings with audio examples from 31 countries from around the world. Using supervised and unsupervised dimensionality reduction techniques we learn feature representations for music similarity. We evaluate how well music styles separate in this learned space with a classification experiment. We obtained moderate performance classifying the recordings by country. Analysis of misclassifications revealed cases of geographical or cultural proximity. We further evaluate the learned space by detecting outliers, i.e. identifying recordings that stand out in the collection. We use a data mining technique based on Mahalanobis distances to detect outliers and perform a listening experiment in the ‘odd one out’ style to evaluate our findings. We are able to detect, amongst others, recordings of non-musical content as outliers as well as music with distinct timbral and harmonic content. The listening experiment reveals moderate agreement between subjects’ ratings and our outlier estimation

    A collaborative filtering method for music recommendation

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsThe present dissertation focuses on proposing and describing a collaborative filtering approach for Music Recommender Systems. Music Recommender Systems, which are part of a broader class of Recommender Systems, refer to the task of automatically filtering data to predict the songs that are more likely to match a particular profile. So far, academic researchers have proposed a variety of machine learning approaches for determining which tracks to recommend to users. The most sophisticated among them consist, often, on complex learning techniques which can also require considerable computational resources. However, recent research studies proved that more simplistic approaches based on nearest neighbors could lead to good results, often at much lower computational costs, representing a viable alternative solution to the Music Recommender System problem. Throughout this thesis, we conduct offline experiments on a freely-available collection of listening histories from real users, each one containing several different music tracks. We extract a subset of 10 000 songs to assess the performance of the proposed system, comparing it with a Popularity-based model approach. Furthermore, we provide a conceptual overview of the recommendation problem, describing the state-of-the-art methods, and presenting its current challenges. Finally, the last section is dedicated to summarizing the essential conclusions and presenting possible future improvements

    An efficient temporally-constrained probabilistic model for multiple-instrument music transcription

    Get PDF
    In this paper, an efficient, general-purpose model for multiple instrument polyphonic music transcription is proposed. The model is based on probabilistic latent component analysis and supports the use of sound state spectral templates, which represent the temporal evolution of each note (e.g. attack, sustain, decay). As input, a variable-Q transform (VQT) time-frequency representation is used. Computational efficiency is achieved by supporting the use of pre-extracted and pre-shifted sound state templates. Two variants are presented: without temporal constraints and with hidden Markov model-based constraints controlling the appearance of sound states. Experiments are performed on benchmark transcription datasets: MAPS, TRIOS, MIREX multiF0, and Bach10; results on multi-pitch detection and instrument assignment show that the proposed models outperform the state-of-the-art for multiple-instrument transcription and is more than 20 times faster compared to a previous sound state-based model. We finally show that a VQT representation can lead to improved multi-pitch detection performance compared with constant-Q representations

    Learnable Front Ends Based on Temporal Modulation for Music Tagging

    Full text link
    While end-to-end systems are becoming popular in auditory signal processing including automatic music tagging, models using raw audio as input needs a large amount of data and computational resources without domain knowledge. Inspired by the fact that temporal modulation is regarded as an essential component in auditory perception, we introduce the Temporal Modulation Neural Network (TMNN) that combines Mel-like data-driven front ends and temporal modulation filters with a simple ResNet back end. The structure includes a set of temporal modulation filters to capture long-term patterns in all frequency channels. Experimental results show that the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands. Moreover, the model performance for each tag suggests that genre or instrument tags with complex rhythm and mood tags can especially be improved with temporal modulation.Comment: Submitted to ICASSP 202

    BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-Resolution

    Full text link
    Generally, Deep Neural Networks (DNNs) are expected to have high performance when their model size is large. However, large models failed to produce high-quality results commensurate with their scale in music Super-Resolution (SR). We attribute this to that DNNs cannot learn information commensurate with their size from standard mean square error losses. To unleash the potential of large DNN models in music SR, we propose BigWavGAN, which incorporates Demucs, a large-scale wave-to-wave model, with State-Of-The-Art (SOTA) discriminators and adversarial training strategies. Our discriminator consists of Multi-Scale Discriminator (MSD) and Multi-Resolution Discriminator (MRD). During inference, since only the generator is utilized, there are no additional parameters or computational resources required compared to the baseline model Demucs. Objective evaluation affirms the effectiveness of BigWavGAN in music SR. Subjective evaluations indicate that BigWavGAN can generate music with significantly high perceptual quality over the baseline model. Notably, BigWavGAN surpasses the SOTA music SR model in both simulated and real-world scenarios. Moreover, BigWavGAN represents its superior generalization ability to address out-of-distribution data. The conducted ablation study reveals the importance of our discriminators and training strategies. Samples are available on the demo page
    • …
    corecore