43 research outputs found

    Template Adaptation for Improving Automatic Music Transcription

    Get PDF
    In this work, we propose a system for automatic music transcription which adapts dictionary templates so that they closely match the spectral shape of the instrument sources present in each recording. Current dictionary-based automatic transcription systems keep the input dictionary fixed, thus the spectral shape of the dictionary components might not match the shape of the test instrument sources. By performing a conservative transcription pre-processing step, the spectral shape of detected notes can be extracted and utilized in order to adapt the template dictionary. We propose two variants for adaptive transcription, namely for single-instrument transcription and for multiple-instrument transcription. Experiments are carried out using the MAPS and Bach10 databases. Results in terms of multi-pitch detection and instrument assignment show that there is a clear and consistent improvement when adapting the dictionary in contrast with keeping the dictionary fixed

    Score-Informed Source Separation for Musical Audio Recordings [An overview]

    Get PDF
    (c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

    A music cognition-guided framework for multi-pitch estimation.

    Get PDF
    As one of the most important subtasks of automatic music transcription (AMT), multi-pitch estimation (MPE) has been studied extensively for predicting the fundamental frequencies in the frames of audio recordings during the past decade. However, how to use music perception and cognition for MPE has not yet been thoroughly investigated. Motivated by this, this demonstrates how to effectively detect the fundamental frequency and the harmonic structure of polyphonic music using a cognitive framework. Inspired by cognitive neuroscience, an integration of the constant Q transform and a state-of-the-art matrix factorization method called shift-invariant probabilistic latent component analysis (SI-PLCA) are proposed to resolve the polyphonic short-time magnitude log-spectra for multiple pitch estimation and source-specific feature extraction. The cognitions of rhythm, harmonic periodicity and instrument timbre are used to guide the analysis of characterizing contiguous notes and the relationship between fundamental frequency and harmonic frequencies for detecting the pitches from the outcomes of SI-PLCA. In the experiment, we compare the performance of proposed MPE system to a number of existing state-of-the-art approaches (seven weak learning methods and four deep learning methods) on three widely used datasets (i.e. MAPS, BACH10 and TRIOS) in terms of F-measure (F1) values. The experimental results show that the proposed MPE method provides the best overall performance against other existing methods

    Speech denoising using nonnegative matrix factorization and neural networks

    Get PDF
    The main goal of this research is to do source separation of single-channel mixed signals such that we get a clean representation of each source. In our case, we are concerned specifically with separating speech of a speaker from background noise as another source. So we deal with single-channel mixtures of speech with stationary, semi-stationary and non-stationary noise types. This is what we define as speech denoising. Our goal is to build a system to which we input a noisy speech signal and get the clean speech out with as little distortion or artifacts as possible. The model requires no prior information about the speaker or the background noise. The separation is done in real-time as we can feed the input signal on a frame-by-frame basis. This model can be used in speech recognition systems to improve recognition accuracy in noisy environments. Two methods were mainly adopted for this purpose, nonnegative matrix factorization (NMF) and neural networks. Experiments were conducted to compare the performance of these two methods for speech denoising. For each of these methods, we compared the performance of the case where we had prior information of both the speaker and noise to having just a general speech dictionary. Also, some experiments were conducted to compare the different architectures and parameters in each of these approaches

    Applying source separation to music

    Get PDF
    International audienceSeparation of existing audio into remixable elements is very useful to repurpose music audio. Applications include upmixing video soundtracks to surround sound (e.g. home theater 5.1 systems), facilitating music transcriptions, allowing better mashups and remixes for disk jockeys, and rebalancing sound levels on multiple instruments or voices recorded simultaneously to a single track. In this chapter, we provide an overview of the algorithms and approaches designed to address the challenges and opportunities in music. Where applicable, we also introduce commonalities and links to source separation for video soundtracks, since many musical scenarios involve video soundtracks (e.g. YouTube recordings of live concerts, movie sound tracks). While space prohibits describing every method in detail, we include detail on representative music‐specific algorithms and approaches not covered in other chapters. The intent is to give the reader a high‐level understanding of the workings of key exemplars of the source separation approaches applied in this domain

    일반화된 디리클레 사전확률을 이용한 비지도적 음원 분리 방법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부, 2018. 2. 이교구.Music source separation aims to extract and reconstruct individual instrument sounds that constitute a mixture sound. It has received a great deal of attention recently due to its importance in the audio signal processing. In addition to its stand-alone applications such as noise reduction and instrument-wise equalization, the source separation can directly affect the performance of the various music information retrieval algorithms when used as a pre-processing. However, conventional source separation algorithms have failed to show satisfactory performance especially without the aid of spatial or musical information about the target source. To deal with this problem, we have focused on the spectral and temporal characteristics of sounds that can be observed in the spectrogram. Spectrogram decomposition is a commonly used technique to exploit such characteristicshowever, only a few simple characteristics such as sparsity were utilizable so far because most of the characteristics were difficult to be expressed in the form of algorithms. The main goal of this thesis is to investigate the possibility of using generalized Dirichlet prior to constrain spectral/temporal bases of the spectrogram decomposition algorithms. As the generalized Dirichlet prior is not only simple but also flexible in its usage, it enables us to utilize more characteristics in the spectrogram decomposition frameworks. From harmonic-percussive sound separation to harmonic instrument sound separation, we apply the generalized Dirichlet prior to various tasks and verify its flexible usage as well as fine performance.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Task of interest 4 1.2.1 Number of channels 4 1.2.2 Utilization of side-information 5 1.3 Approach 6 1.3.1 Spectrogram decomposition with constraints 7 1.3.2 Dirichlet prior 11 1.3.3 Contribution 12 1.4 Outline of the thesis 13 Chapter 2 Theoretical background 17 2.1 Probabilistic latent component analysis 18 2.2 Non-negative matrix factorization 21 2.3 Dirichlet prior 23 2.3.1 PLCA framework 24 2.3.2 NMF framework 26 2.4 Summary 28 Chapter 3 Harmonic-Percussive Source Separation Using Harmonicity and Sparsity Constraints . . 30 3.1 Introduction 30 3.2 Proposed method 33 3.2.1 Formulation of Harmonic-Percussive Separation 33 3.2.2 Relation to Dirichlet Prior 35 3.3 Performance evaluation 37 3.3.1 Sample Problem 37 3.3.2 Qualitative Analysis 38 3.3.3 Quantitative Analysis 42 3.4 Summary 43 Chapter 4 Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation 46 4.1 Introduction 46 4.2 Proposed Method 51 4.2.1 Characteristics of harmonic and percussive components 51 4.2.2 Derivation of the proposed method 56 4.2.3 Algorithm interpretation 61 4.3 Performance Evaluation 62 4.3.1 Parameter setting 63 4.3.2 Toy examples 66 4.3.3 SiSEC 2015 dataset 69 4.3.4 QUASI dataset 84 4.3.5 Subjective performance evaluation 85 4.3.6 Audio demo 87 4.4 Summary 87 Chapter 5 Informed Approach to Harmonic Instrument sound Separation 89 5.1 Introduction 89 5.2 Proposed method 91 5.2.1 Excitation-filter model 92 5.2.2 Linear predictive coding 94 5.2.3 Spectrogram decomposition procedure 96 5.3 Performance evaluation 99 5.3.1 Experimental settings 99 5.3.2 Performance comparison 101 5.3.3 Envelope extraction 102 5.4 Summary 104 Chapter 6 Blind Approach to Harmonic Instrument sound Separation 105 6.1 Introduction 105 6.2 Proposed method 106 6.3 Performance evaluation 109 6.3.1 Weight optimization 109 6.3.2 Performance comparison 109 6.3.3 Effect of envelope similarity 112 6.4 Summary 114 Chapter 7 Conclusion and Future Work 115 7.1 Contributions 115 7.2 Future work 119 7.2.1 Application to multi-channel audio environment 119 7.2.2 Application to vocal separation 119 7.2.3 Application to various audio source separation tasks 120 Bibliography 121 초 록 137Docto
    corecore