    고유 특성을 활용한 음악에서의 보컬 분리

    학위논문 (박사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부, 2018. 2. 이교구.보컬 분리란 음악 신호를 보컬 성분과 반주 성분으로 분리하는 일 또는 그 방법을 의미한다. 이러한 기술은 음악의 특정한 성분에 담겨 있는 정보를 추출하기 위한 전처리 과정에서부터, 보컬 연습과 같이 분리 음원 자체를 활용하는 등의 다양한 목적으로 사용될 수 있다. 본 논문의 목적은 보컬과 반주가 가지고 있는 고유한 특성에 대해 논의하고 그것을 활용하여 보컬 분리 알고리즘들을 개발하는 것이며, 특히 `특징 기반' 이라고 불리는 다음과 같은 상황에 대해 중점적으로 논의한다. 우선 분리 대상이 되는 음악 신호는 단채널로 제공된다고 가정하며, 이 경우 신호의 공간적 정보를 활용할 수 있는 다채널 환경에 비해 더욱 어려운 환경이라고 볼 수 있다. 또한 기계 학습 방법으로 데이터로부터 각 음원의 모델을 추정하는 방법을 배제하며, 대신 저차원의 특성들로부터 모델을 유도하여 이를 목표 함수에 반영하는 방법을 시도한다. 마지막으로, 가사, 악보, 사용자의 안내 등과 같은 외부의 정보 역시 제공되지 않는다고 가정한다. 그러나 보컬 분리의 경우 암묵 음원 분리 문제와는 달리 분리하고자 하는 음원이 각각 보컬과 반주에 해당한다는 최소한의 정보는 제공되므로 각각의 성질들에 대한 분석은 가능하다. 크게 세 종류의 특성이 본 논문에서 중점적으로 논의된다. 우선 연속성의 경우 주파수 또는 시간 측면으로 각각 논의될 수 있는데, 주파수축 연속성의 경우 소리의 음색적 특성을, 시간축 연속성은 소리가 안정적으로 지속되는 정도를 각각 나타낸다고 볼 수 있다. 또한, 저행렬계수 특성은 신호의 구조적 성질을 반영하며 해당 신호가 낮은 행렬계수를 가지는 형태로 표현될 수 있는지를 나타내며, 성김 특성은 신호의 분포 형태가 얼마나 성기거나 조밀한지를 나타낸다. 본 논문에서는 크게 두 가지의 보컬 분리 방법에 대해 논의한다. 첫 번째 방법은 연속성과 성김 특성에 기반을 두고 화성 악기-타악기 분리 방법 (harmonic-percussive sound separation, HPSS) 을 확장하는 방법이다. 기존의 방법이 두 번의 HPSS 과정을 통해 보컬을 분리하는 것에 비해 제안하는 방법은 성긴 잔여 성분을 추가해 한 번의 보컬 분리 과정만을 사용한다. 논의되는 다른 방법은 저행렬계수 특성과 성김 특성을 활용하는 것으로, 반주가 저행렬계수 모델로 표현될 수 있는 반면 보컬은 성긴 분포를 가진다는 가정에 기반을 둔다. 이러한 성분들을 분리하기 위해 강인한 주성분 분석 (robust principal component analysis, RPCA) 을 이용하는 방법이 대표적이다. 본 논문에서는 보컬 분리 성능에 초점을 두고 RPCA 알고리즘을 일반화하거나 확장하는 방식에 대해 논의하며, 트레이스 노름과 l1 노름을 각각 샤텐 p 노름과 lp 노름으로 대체하는 방법, 스케일 압축 방법, 주파수 분포 특성을 반영하는 방법 등을 포함한다. 제안하는 알고리즘들은 다양한 데이터셋과 대회에서 평가되었으며 최신의 보컬 분리 알고리즘들보다 더 우수하거나 비슷한 결과를 보였다.Singing voice separation (SVS) refers to the task or the method of decomposing music signal into singing voice and its accompanying instruments. It has various uses, from the preprocessing step, to extract the musical features implied in the target source, to applications for itself such as vocal training. This thesis aims to discover the common properties of singing voice and accompaniment, and apply it to advance the state-of-the-art SVS algorithms. In particular, the separation approach as follows, which is named `characteristics-based,' is concentrated in this thesis. First, the music signal is assumed to be provided in monaural, or as a single-channel recording. It is more difficult condition compared to multiple-channel recording since spatial information cannot be applied in the separation procedure. This thesis also focuses on unsupervised approach, that does not use machine learning technique to estimate the source model from the training data. The models are instead derived based on the low-level characteristics and applied to the objective function. Finally, no external information such as lyrics, score, or user guide is provided. Unlike blind source separation problems, however, the classes of the target sources, singing voice and accompaniment, are known in SVS problem, and it allows to estimate those respective properties. Three different characteristics are primarily discussed in this thesis. Continuity, in the spectral or temporal dimension, refers the smoothness of the source in the particular aspect. The spectral continuity is related with the timbre, while the temporal continuity represents the stability of sounds. On the other hand, the low-rankness refers how the signal is well-structured and can be represented as a low-rank data, and the sparsity represents how rarely the sounds in signals occur in time and frequency. This thesis discusses two SVS approaches using above characteristics. First one is based on the continuity and sparsity, which extends the harmonic-percussive sound separation (HPSS). While the conventional algorithm separates singing voice by using a two-stage HPSS, the proposed one has a single stage procedure but with an additional sparse residual term in the objective function. Another SVS approach is based on the low-rankness and sparsity. Assuming that accompaniment can be represented as a low-rank model, whereas singing voice has a sparse distribution, conventional algorithm decomposes the sources by using robust principal component analysis (RPCA). In this thesis, generalization or extension of RPCA especially for SVS is discussed, including the use of Schatten p-/lp-norm, scale compression, and spectral distribution. The presented algorithms are evaluated using various datasets and challenges and achieved the better comparable results compared to the state-of-the-art algorithms.Chapter 1 Introduction 1 1.1 Motivation 4 1.2 Applications 5 1.3 Definitions and keywords 6 1.4 Evaluation criteria 7 1.5 Topics of interest 11 1.6 Outline of the thesis 13 Chapter 2 Background 15 2.1 Spectrogram-domain separation framework 15 2.2 Approaches for singing voice separation 19 2.2.1 Characteristics-based approach 20 2.2.2 Spatial approach 21 2.2.3 Machine learning-based approach 22 2.2.4 informed approach 23 2.3 Datasets and challenges 25 2.3.1 Datasets 25 2.3.2 Challenges 26 Chapter 3 Characteristics of music sources 28 3.1 Introduction 28 3.2 Spectral/temporal continuity 29 3.2.1 Continuity of a spectrogram 29 3.2.2 Continuity of musical sources 30 3.3 Low-rankness 31 3.3.1 Low-rankness of a spectrogram 31 3.3.2 Low-rankness of musical sources 33 3.4 Sparsity 34 3.4.1 Sparsity of a spectrogram 34 3.4.2 Sparsity of musical sources 36 3.5 Experiments 38 3.6 Summary 39 Chapter 4 Singing voice separation using continuity and sparsity 43 4.1 Introduction 43 4.2 SVS using two-stage HPSS 45 4.2.1 Harmonic-percussive sound separation 45 4.2.2 SVS using two-stage HPSS 46 4.3 Proposed algorithm 48 4.4 Experimental evaluation 52 4.4.1 MIR-1k Dataset 52 4.4.2 Beach boys Dataset 55 4.4.3 iKala dataset in MIREX 2014 56 4.5 Conclusion 58 Chapter 5 Singing voice separation using low-rankness and sparsity 61 5.1 Introduction 61 5.2 SVS using robust principal component analysis 63 5.2.1 Robust principal component analysis 63 5.2.2 Optimization for RPCA using augmented Lagrangian multiplier method 63 5.2.3 SVS using RPCA 65 5.3 SVS using generalized RPCA 67 5.3.1 Generalized RPCA using Schatten p- and lp-norm 67 5.3.2 Comparison of pRPCA with robust matrix completion 68 5.3.3 Optimization method of pRPCA 69 5.3.4 Discussion of the normalization factor for λ 69 5.3.5 Generalized RPCA using scale compression 71 5.3.6 Experimental results 72 5.4 SVS using RPCA and spectral distribution 73 5.4.1 RPCA with weighted l1-norm 73 5.4.2 Proposed method: SVS using wRPCA 74 5.4.3 Experimental results using DSD100 dataset 78 5.4.4 Comparison with state-of-the-arts in SiSEC 2016 79 5.4.5 Discussion 85 5.5 Summary 86 Chapter 6 Conclusion and Future Work 88 6.1 Conclusion 88 6.2 Contributions 89 6.3 Future work 91 6.3.1 Discovering various characteristics for SVS 91 6.3.2 Expanding to other SVS approaches 92 6.3.3 Applying the characteristics for deep learning models 92 Bibliography 94 초 록 110Docto

    Interferometric synthetic aperture sonar system supported by satellite

    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

    Sensor Signal and Information Processing II

    In the current age of information explosion, newly invented technological sensors and software are now tightly integrated with our everyday lives. Many sensor processing algorithms have incorporated some forms of computational intelligence as part of their core framework in problem solving. These algorithms have the capacity to generalize and discover knowledge for themselves and learn new information whenever unseen data are captured. The primary aim of sensor processing is to develop techniques to interpret, understand, and act on information contained in the data. The interest of this book is in developing intelligent signal processing in order to pave the way for smart sensors. This involves mathematical advancement of nonlinear signal processing theory and its applications that extend far beyond traditional techniques. It bridges the boundary between theory and application, developing novel theoretically inspired methodologies targeting both longstanding and emergent signal processing applications. The topic ranges from phishing detection to integration of terrestrial laser scanning, and from fault diagnosis to bio-inspiring filtering. The book will appeal to established practitioners, along with researchers and students in the emerging field of smart sensors processing

    Wavelet Theory

    The wavelet is a powerful mathematical tool that plays an important role in science and technology. This book looks at some of the most creative and popular applications of wavelets including biomedical signal processing, image processing, communication signal processing, Internet of Things (IoT), acoustical signal processing, financial market data analysis, energy and power management, and COVID-19 pandemic measurements and calculations. The editor’s personal interest is the application of wavelet transform to identify time domain changes on signals and corresponding frequency components and in improving power amplifier behavior

    Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC

    A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC

    A Robust Sparse Adaptive Filtering Algorithm with a Correntropy Induced Metric Constraint for Broadband Multi-Path Channel Estimation

    A robust sparse least-mean mixture-norm (LMMN) algorithm is proposed, and its performance is appraised in the context of estimating a broadband multi-path wireless channel. The proposed algorithm is implemented via integrating a correntropy-induced metric (CIM) penalty into the conventional LMMN algorithm to modify the basic cost function, which is denoted as the CIM-based LMMN (CIM-LMMN) algorithm. The proposed CIM-LMMN algorithm is derived in detail within the kernel framework. The updating equation of CIM-LMMN can provide a zero attractor to attract the non-dominant channel coefficients to zeros, and it also gives a tradeoff between the sparsity and the estimation misalignment. Moreover, the channel estimation behavior is investigated over a broadband sparse multi-path wireless channel, and the simulation results are compared with the least mean square/fourth (LMS/F), least mean square (LMS), least mean fourth (LMF) and the recently-developed sparse channel estimation algorithms. The channel estimation performance obtained from the designated sparse channel estimation demonstrates that the CIM-LMMN algorithm outperforms the recently-developed sparse LMMN algorithms and the relevant sparse channel estimation algorithms. From the results, we can see that our CIM-LMMN algorithm is robust and is superior to these mentioned algorithms in terms of both the convergence speed rate and the channel estimation misalignment for estimating a sparse channel

    Biomedical Photoacoustic Imaging and Sensing Using Affordable Resources

    The overarching goal of this book is to provide a current picture of the latest developments in the capabilities of biomedical photoacoustic imaging and sensing in an affordable setting, such as advances in the technology involving light sources, and delivery, acoustic detection, and image reconstruction and processing algorithms. This book includes 14 chapters from globally prominent researchers , covering a comprehensive spectrum of photoacoustic imaging topics from technology developments and novel imaging methods to preclinical and clinical studies, predominantly in a cost-effective setting. Affordability is undoubtedly an important factor to be considered in the following years to help translate photoacoustic imaging to clinics around the globe. This first-ever book focused on biomedical photoacoustic imaging and sensing using affordable resources is thus timely, especially considering the fact that this technique is facing an exciting transition from benchtop to bedside. Given its scope, the book will appeal to scientists and engineers in academia and industry, as well as medical experts interested in the clinical applications of photoacoustic imaging