20 research outputs found

    Structured Dropout for Weak Label and Multi-Instance Learning and Its Application to Score-Informed Source Separation

    Full text link
    Many success stories involving deep neural networks are instances of supervised learning, where available labels power gradient-based learning methods. Creating such labels, however, can be expensive and thus there is increasing interest in weak labels which only provide coarse information, with uncertainty regarding time, location or value. Using such labels often leads to considerable challenges for the learning process. Current methods for weak-label training often employ standard supervised approaches that additionally reassign or prune labels during the learning process. The information gain, however, is often limited as only the importance of labels where the network already yields reasonable results is boosted. We propose treating weak-label training as an unsupervised problem and use the labels to guide the representation learning to induce structure. To this end, we propose two autoencoder extensions: class activity penalties and structured dropout. We demonstrate the capabilities of our approach in the context of score-informed source separation of music

    Score-Informed Source Separation for Musical Audio Recordings [An overview]

    Get PDF
    (c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

    A review on initialization methods for nonnegative matrix factorization: Towards omics data experiments

    Get PDF
    Nonnegative Matrix Factorization (NMF) has acquired a relevant role in the panorama of knowledge extraction, thanks to the peculiarity that non-negativity applies to both bases and weights, which allows meaningful interpretations and is consistent with the natural human part-based learning process. Nevertheless, most NMF algorithms are iterative, so initialization methods affect convergence behaviour, the quality of the final solution, and NMF performance in terms of the residual of the cost function. Studies on the impact of NMF initialization techniques have been conducted for text or image datasets, but very few considerations can be found in the literature when biological datasets are studied, even though NMFs have largely demonstrated their usefulness in better understanding biological mechanisms with omic datasets. This paper aims to present the state-of-the-art on NMF initialization schemes along with some initial considerations on the impact of initialization methods when microarrays (a simple instance of omic data) are evaluated with NMF mechanisms. Using a series of measures to qualitatively examine the biological information extracted by a given NMF scheme, it preliminary appears that some information (e.g., represented by genes) can be extracted regardless of the initialization scheme used

    Probabilistic generative modeling of speech

    Get PDF
    Speech processing refers to a set of tasks that involve speech analysis and synthesis. Most speech processing algorithms model a subset of speech parameters of interest and blur the rest using signal processing techniques and feature extraction. However, evidence shows that many speech parameters can be more accurately estimated if they are modeled jointly; speech synthesis also benefits from joint modeling. This thesis proposes a probabilistic generative model for speech called the Probabilistic Acoustic Tube (PAT). The highlights of the model are threefold. First, it is among the very first works to build a complete probabilistic model for speech. Second, it has a well-designed model for the phase spectrum of speech, which has been hard to model and often neglected. Third, it models the AM-FM effects in speech, which are perceptually significant but often ignored in frame-based speech processing algorithms. Experiment shows that the proposed model has good potential for a number of speech processing tasks

    Automatic Music Transcription: Breaking the Glass Ceiling

    Get PDF
    Automatic music transcription is considered by many to be the Holy Grail in the field of music signal analysis. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. In order to overcome the limited performance of transcription systems, algorithms have to be tailored to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information across different methods and musical aspects

    AUDIO QUERY-BASED MUSIC SOURCE SEPARATION

    Get PDF
    학위논문 (석사) -- 서울대학교 대학원 : 융합과학기술대학원 디지털정보융합학과, 2020. 8. 이교구.최근 몇 년 동안, 음악 음원 분리는 음악 정보 검색 분야에서 가장 활발하게 연구 가 이루어진 분야 중 하나이다. 또한 딥 러닝의 발전으로 인해 음악 음원 분리 성능은 큰 폭으로 향상했다. 그러나 대부분의 이전 연구들은 단일 악기 또는 보컬, 드럼, 베 이스와 같은 제한된 수의 음원을 분리하는데 그쳤으며, 확장성에 대한 연구는 많이 이루어지지 않았다. 본 연구에서는 오디오 쿼리 기반 음원 분리를 위해 목표 신호의 수 또는 종류에 관계없이 쿼리 신호로부터 소스의 정보를 인코딩할 수 있는 네트워크를 제안한다. 제안된 기법은 쿼리 인코딩 네트워크와 음원 분리 네트워크로 구성된다. 오디오 쿼 리와 합성 음원이 주어지면 쿼리 인코딩 네트워크는 쿼리를 잠재 공간으로 인코딩 하고, 음원 분리 네트워크는 잠재 벡터에 의해 컨디셔닝된 마스크를 출력하며, 이 마스크는 합성 음원에 곱해져 음원을 분리한다. 또한 음원 분리 네트워크는 학습 샘플에서 얻어진 잠재 벡터를 사용하여 오디오 쿼리가 주어지지 않은 환경에서도 동작할 수 있다. 제안한 기법의 평가를 위해 MUSDB18과 Slakh을 이용하며, 실험 결과는 제안된 기법이 단일 네트워크로 여러 소스를 분리할 수 있음을 보인다. 또한, 잠재 공간에 대한 분석을 통해 제안된 기법이 잠재 벡터의 보간을 통해 연속적인 출력을 생성할 수 있음을 보인다In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals. The proposed method consists of a Query-net and a Separator: given a query and a mixture, the Query-net encodes the query into the latent space, and the Separator estimates masks conditioned by the latent vector, which is then applied to the mixture for separation. The Separator can also generate masks using the latent vector from the training samples, allowing separation in the absence of a query. We evaluate our method on the MUSDB18 dataset and the Slakh dataset, and experimental results show that the proposed method can separate multiple sources with a single network. In addition, through further investigation of the latent space we demonstrate that our method can generate continuous outputs via latent vector interpolation.제 1 장 서론 5 1.1 연구 배경 5 1.2 연구 목표 8 제 2 장 배경 이론 및 관련 연구 10 2.1 배경 이론 10 2.1.1 음원 분리 10 2.1.2 Variational Autoencoder 11 2.2 관련 연구 14 2.2.1 음원 분리 연구 14 2.2.2 기타 분야 연구 17 제 3 장 제안 기법 20 3.1 오디오 쿼리 기반 음원 분리 20 3.2 학습 23 3.2.1 학습 데이터 구성 23 3.2.2 학습 목적 24 3.3 테스트 26 제 4 장 실험 28 4.1 데이터셋 28 4.2 실험 상세 설정 30 4.3 새로운 샘플에 대한 쿼리 인코딩 네트워크 동작 31 4.4 오디오 쿼리를 이용한 특정 악기 분리 32 4.5 잠재 벡터 보간을 이용한 음원 분리 34 4.6 잠재 벡터가 음원 분리 성능에 미치는 영향 분석 35 4.7 세분화된 클래스 정보를 이용한 음원 분리 비교 실험 38 4.8 분리 반복법 40 4.9 정량 평가 43 제 5 장 결론 46 5.1 연구 의의 46 5.2 향후 연구 47 ABSTRACT 56Maste

    Automatic music transcription: challenges and future directions

    Get PDF
    Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects
    corecore