5 research outputs found

    Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation

    Get PDF
    A major problem in time-scale modification (TSM) of music signals is that percussive transients are often perceptually degraded. To prevent this degradation, some TSM approaches try to explicitly identify transients in the input signal and to handle them in a special way. However, such approaches are problematic for two reasons. First, errors in the transient detection have an immediate influence on the final TSM result and, second, a perceptual transparent preservation of transients is by far not a trivial task. In this paper we present a TSM approach that handles transients implicitly by first separating the signal into a harmonic component as well as a percussive component which typically contains the transients. While the harmonic component is modified with a phase vocoder approach using a large frame size, the noise-like percussive component is modified with a simple time-domain overlap-add technique using a short frame size, which preserves the transients to a hig h degree without any explicit transient detection

    Impulsive spike enhancement on gamelan audio using harmonic percussive separation

    Get PDF
    Impulsive spikes often occur in audio recording of gamelan where most existing methods reduce it. This research offers new method to enhance audio impulsive spike in gamelan music that is able to reduce, eliminate and even strengthen spikes. The process separates audio components into harmonics and percussive components. Percussion component is set to rise or lowered, and the results of the process combined with harmonic components again. This study proposes a new method that allows reducing, eliminating and even amplifying the spike. From the similarity test using the Cosine Distance method, it is seen that spike enhancement through Harmonic Percussive Source Separation (HPSS) has an average Cosine Distance value of 0.0004 or similar to its original, while Mean Square Error (MSE) has an average value of 0.0004 that is very small in average error and also very similar. From the Perceptual Evaluation of Audio Quality (PEAQ) testing with Harmonic Percussive Source Separation (HPSS), it has a better quality with an average Objective Difference Grade (ODG) of -0.24 or Imperceptible

    From raw audio to a seamless mix : creating an automated DJ system for drum and bass

    Get PDF
    We present the open-source implementation of the first fully automatic and comprehensive DJ system, able to generate seamless music mixes using songs from a given library much like a human DJ does. The proposed system is built on top of several enhanced music information retrieval (MIR) techniques, such as for beat tracking, downbeat tracking, and structural segmentation, to obtain an understanding of the musical structure. Leveraging the understanding of the music tracks offered by these state-of-the-art MIR techniques, the proposed system surpasses existing automatic DJ systems both in accuracy and completeness. To the best of our knowledge, it is the first fully integrated solution that takes all basic Wing best practices into account, from beat and downbeat matching to identification of suitable cue points, determining a suitable cross-fade profile and compiling an interesting playlist that trades off innovation with continuity. To make this possible, we focused on one specific sub-genre of electronic dance music, namely Drum and Bass. This allowed us to exploit genre-specific properties, resulting in a more robust performance and tailored mixing behavior. Evaluation on a corpus of 160 Drum and Bass songs and an additional hold-out set of 220 songs shows that the used MIR algorithms can annotate 91% of the songs with fully correct annotations (tempo, beats, downbeats, and structure for cue points). On these songs, the proposed song selection process and the implemented Wing techniques enable the system to generate mixes of high quality, as confirmed by a subjective user test in which 18 Drum and Bass fans participated

    Underdetermined convolutive source separation using two dimensional non-negative factorization techniques

    Get PDF
    PhD ThesisIn this thesis the underdetermined audio source separation has been considered, that is, estimating the original audio sources from the observed mixture when the number of audio sources is greater than the number of channels. The separation has been carried out using two approaches; the blind audio source separation and the informed audio source separation. The blind audio source separation approach depends on the mixture signal only and it assumes that the separation has been accomplished without any prior information (or as little as possible) about the sources. The informed audio source separation uses the exemplar in addition to the mixture signal to emulate the targeted speech signal to be separated. Both approaches are based on the two dimensional factorization techniques that decompose the signal into two tensors that are convolved in both the temporal and spectral directions. Both approaches are applied on the convolutive mixture and the high-reverberant convolutive mixture which are more realistic than the instantaneous mixture. In this work a novel algorithm based on the nonnegative matrix factor two dimensional deconvolution (NMF2D) with adaptive sparsity has been proposed to separate the audio sources that have been mixed in an underdetermined convolutive mixture. Additionally, a novel Gamma Exponential Process has been proposed for estimating the convolutive parameters and number of components of the NMF2D/ NTF2D, and to initialize the NMF2D parameters. In addition, the effects of different window length have been investigated to determine the best fit model that suit the characteristics of the audio signal. Furthermore, a novel algorithm, namely the fusion K models of full-rank weighted nonnegative tensor factor two dimensional deconvolution (K-wNTF2D) has been proposed. The K-wNTF2D is developed for its ability in modelling both the spectral and temporal changes, and the spatial covariance matrix that addresses the high reverberation problem. Variable sparsity that derived from the Gibbs distribution is optimized under the Itakura-Saito divergence and adapted into the K-wNTF2D model. The tensors of this algorithm have been initialized by a novel initialization method, namely the SVD two-dimensional deconvolution (SVD2D). Finally, two novel informed source separation algorithms, namely, the semi-exemplar based algorithm and the exemplar-based algorithm, have been proposed. These algorithms based on the NMF2D model and the proposed two dimensional nonnegative matrix partial co-factorization (2DNMPCF) model. The idea of incorporating the exemplar is to inform the proposed separation algorithms about the targeted signal to be separated by initializing its parameters and guide the proposed separation algorithms. The adaptive sparsity is derived for both ii of the proposed algorithms. Also, a multistage of the proposed exemplar based algorithm has been proposed in order to further enhance the separation performance. Results have shown that the proposed separation algorithms are very promising, more flexible, and offer an alternative model to the conventional methods