Search CORE

6 research outputs found

AUDIO QUERY-BASED MUSIC SOURCE SEPARATION

Author: 이지환
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (석사) -- 서울대학교 대학원 : 융합과학기술대학원 디지털정보융합학과, 2020. 8. 이교구.최근 몇 년 동안, 음악 음원 분리는 음악 정보 검색 분야에서 가장 활발하게 연구 가 이루어진 분야 중 하나이다. 또한 딥 러닝의 발전으로 인해 음악 음원 분리 성능은 큰 폭으로 향상했다. 그러나 대부분의 이전 연구들은 단일 악기 또는 보컬, 드럼, 베 이스와 같은 제한된 수의 음원을 분리하는데 그쳤으며, 확장성에 대한 연구는 많이 이루어지지 않았다. 본 연구에서는 오디오 쿼리 기반 음원 분리를 위해 목표 신호의 수 또는 종류에 관계없이 쿼리 신호로부터 소스의 정보를 인코딩할 수 있는 네트워크를 제안한다. 제안된 기법은 쿼리 인코딩 네트워크와 음원 분리 네트워크로 구성된다. 오디오 쿼 리와 합성 음원이 주어지면 쿼리 인코딩 네트워크는 쿼리를 잠재 공간으로 인코딩 하고, 음원 분리 네트워크는 잠재 벡터에 의해 컨디셔닝된 마스크를 출력하며, 이 마스크는 합성 음원에 곱해져 음원을 분리한다. 또한 음원 분리 네트워크는 학습 샘플에서 얻어진 잠재 벡터를 사용하여 오디오 쿼리가 주어지지 않은 환경에서도 동작할 수 있다. 제안한 기법의 평가를 위해 MUSDB18과 Slakh을 이용하며, 실험 결과는 제안된 기법이 단일 네트워크로 여러 소스를 분리할 수 있음을 보인다. 또한, 잠재 공간에 대한 분석을 통해 제안된 기법이 잠재 벡터의 보간을 통해 연속적인 출력을 생성할 수 있음을 보인다In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals. The proposed method consists of a Query-net and a Separator: given a query and a mixture, the Query-net encodes the query into the latent space, and the Separator estimates masks conditioned by the latent vector, which is then applied to the mixture for separation. The Separator can also generate masks using the latent vector from the training samples, allowing separation in the absence of a query. We evaluate our method on the MUSDB18 dataset and the Slakh dataset, and experimental results show that the proposed method can separate multiple sources with a single network. In addition, through further investigation of the latent space we demonstrate that our method can generate continuous outputs via latent vector interpolation.제 1 장 서론 5 1.1 연구 배경 5 1.2 연구 목표 8 제 2 장 배경 이론 및 관련 연구 10 2.1 배경 이론 10 2.1.1 음원 분리 10 2.1.2 Variational Autoencoder 11 2.2 관련 연구 14 2.2.1 음원 분리 연구 14 2.2.2 기타 분야 연구 17 제 3 장 제안 기법 20 3.1 오디오 쿼리 기반 음원 분리 20 3.2 학습 23 3.2.1 학습 데이터 구성 23 3.2.2 학습 목적 24 3.3 테스트 26 제 4 장 실험 28 4.1 데이터셋 28 4.2 실험 상세 설정 30 4.3 새로운 샘플에 대한 쿼리 인코딩 네트워크 동작 31 4.4 오디오 쿼리를 이용한 특정 악기 분리 32 4.5 잠재 벡터 보간을 이용한 음원 분리 34 4.6 잠재 벡터가 음원 분리 성능에 미치는 영향 분석 35 4.7 세분화된 클래스 정보를 이용한 음원 분리 비교 실험 38 4.8 분리 반복법 40 4.9 정량 평가 43 제 5 장 결론 46 5.1 연구 의의 46 5.2 향후 연구 47 ABSTRACT 56Maste

SNU Open Repository and Archive

Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates

Author: Fazekas George
Lattner Stefan
Pasini Marco
Publication venue
Publication date: 21/11/2023
Field of study

Music source separation is focused on extracting distinct sonic elements from composite tracks. Historically, many methods have been grounded in supervised learning, necessitating labeled data, which is occasionally constrained in its diversity. More recent methods have delved into N-shot techniques that utilize one or more audio samples to aid in the separation. However, a challenge with some of these methods is the necessity for an audio query during inference, making them less suited for genres with varied timbres and effects. This paper offers a proof-of-concept for a self-supervised music source separation system that eliminates the need for audio queries at inference time. In the training phase, while it adopts a query-based approach, we introduce a modification by substituting the continuous embedding of query audios with Vector Quantized (VQ) representations. Trained end-to-end with up to N classes as determined by the VQ's codebook size, the model seeks to effectively categorise instrument classes. During inference, the input is partitioned into N sources, with some potentially left unutilized based on the mix's instrument makeup. This methodology suggests an alternative avenue for considering source separation across diverse music genres. We provide examples and additional results online.Comment: 4 pages, 2 figures, 1 table; Accepted at the 37th Conference on Neural Information Processing Systems (2023), Machine Learning for Audio Worksho

arXiv.org e-Print Archive

Gendering the Virtual Space: Sonic Femininities and Masculinities in Contemporary Top 40 Music

Author: Duguay Michèle
Publication venue: CUNY Academic Works
Publication date: 01/09/2021
Field of study

This dissertation analyzes vocal placement—the apparent location of a voice in the virtual space created by a recording—and its relationship to gender. When listening to a piece of recorded music through headphones or stereo speakers, one hears various sound sources as though they were located in a virtual space (Clarke 2013). For instance, a specific vocal performance—once manipulated by various technologies in a recording studio—might evoke a concert hall, an intimate setting, or an otherworldly space. The placement of the voice within this space is one of the central musical parameters through which listeners ascribe cultural meanings to popular music. I develop an original methodology for analyzing vocal placement in recorded popular music. Combining close listening with music information retrieval tools, I precisely locate a voice’s placement in virtual space according to five parameters: (1) Width, (2) Pitch Height, (3) Prominence, (4) Environment, and (5) Layering. I use the methodology to conduct close and distant readings of vocal placement in twenty-first-century Anglo-American popular music. First, an analysis of “Love the Way You Lie” (2010), by Eminem feat. Rihanna, showcases how the methodology can be used to support close readings of individual songs. Through my analysis, I suggest that Rihanna’s wide vocal placement evokes a nexus of conflicting emotions in the wake of domestic violence. Eminem’s narrow placement, conversely, expresses anger, frustration, and violence. Second, I use the analytical methodology to conduct a larger-scale study of vocal placement in a corpus of 113 post-2008 Billboard chart-topping collaborations between two or more artists. By stepping away from close readings of individual songs, I show how gender stereotypes are engineered en masse in the popular music industry. I show that women artists are generally assigned vocal placements that are wider, more layered, and more reverberated than those of men. This vocal placement configuration—exemplified in “Love the Way You Lie”—creates a sonic contrast that presents women’s voices as ornamental and diffuse, and men’s voices as direct and relatable. I argue that these contrasting vocal placements sonically construct a gender binary, exemplifying one of the ways in which dichotomous conceptions of gender are reinforced through the sound of popular music

City University of New York