6 research outputs found

    AUDIO QUERY-BASED MUSIC SOURCE SEPARATION

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ๋””์ง€ํ„ธ์ •๋ณด์œตํ•ฉํ•™๊ณผ, 2020. 8. ์ด๊ต๊ตฌ.์ตœ๊ทผ ๋ช‡ ๋…„ ๋™์•ˆ, ์Œ์•… ์Œ์› ๋ถ„๋ฆฌ๋Š” ์Œ์•… ์ •๋ณด ๊ฒ€์ƒ‰ ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฐ๊ตฌ ๊ฐ€ ์ด๋ฃจ์–ด์ง„ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋˜ํ•œ ๋”ฅ ๋Ÿฌ๋‹์˜ ๋ฐœ์ „์œผ๋กœ ์ธํ•ด ์Œ์•… ์Œ์› ๋ถ„๋ฆฌ ์„ฑ๋Šฅ์€ ํฐ ํญ์œผ๋กœ ํ–ฅ์ƒํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋Œ€๋ถ€๋ถ„์˜ ์ด์ „ ์—ฐ๊ตฌ๋“ค์€ ๋‹จ์ผ ์•…๊ธฐ ๋˜๋Š” ๋ณด์ปฌ, ๋“œ๋Ÿผ, ๋ฒ  ์ด์Šค์™€ ๊ฐ™์€ ์ œํ•œ๋œ ์ˆ˜์˜ ์Œ์›์„ ๋ถ„๋ฆฌํ•˜๋Š”๋ฐ ๊ทธ์ณค์œผ๋ฉฐ, ํ™•์žฅ์„ฑ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋Š” ๋งŽ์ด ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์˜ค๋””์˜ค ์ฟผ๋ฆฌ ๊ธฐ๋ฐ˜ ์Œ์› ๋ถ„๋ฆฌ๋ฅผ ์œ„ํ•ด ๋ชฉํ‘œ ์‹ ํ˜ธ์˜ ์ˆ˜ ๋˜๋Š” ์ข…๋ฅ˜์— ๊ด€๊ณ„์—†์ด ์ฟผ๋ฆฌ ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์†Œ์Šค์˜ ์ •๋ณด๋ฅผ ์ธ์ฝ”๋”ฉํ•  ์ˆ˜ ์žˆ๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ ์ฟผ๋ฆฌ ์ธ์ฝ”๋”ฉ ๋„คํŠธ์›Œํฌ์™€ ์Œ์› ๋ถ„๋ฆฌ ๋„คํŠธ์›Œํฌ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์˜ค๋””์˜ค ์ฟผ ๋ฆฌ์™€ ํ•ฉ์„ฑ ์Œ์›์ด ์ฃผ์–ด์ง€๋ฉด ์ฟผ๋ฆฌ ์ธ์ฝ”๋”ฉ ๋„คํŠธ์›Œํฌ๋Š” ์ฟผ๋ฆฌ๋ฅผ ์ž ์žฌ ๊ณต๊ฐ„์œผ๋กœ ์ธ์ฝ”๋”ฉ ํ•˜๊ณ , ์Œ์› ๋ถ„๋ฆฌ ๋„คํŠธ์›Œํฌ๋Š” ์ž ์žฌ ๋ฒกํ„ฐ์— ์˜ํ•ด ์ปจ๋””์…”๋‹๋œ ๋งˆ์Šคํฌ๋ฅผ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ด ๋งˆ์Šคํฌ๋Š” ํ•ฉ์„ฑ ์Œ์›์— ๊ณฑํ•ด์ ธ ์Œ์›์„ ๋ถ„๋ฆฌํ•œ๋‹ค. ๋˜ํ•œ ์Œ์› ๋ถ„๋ฆฌ ๋„คํŠธ์›Œํฌ๋Š” ํ•™์Šต ์ƒ˜ํ”Œ์—์„œ ์–ป์–ด์ง„ ์ž ์žฌ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค ์ฟผ๋ฆฌ๊ฐ€ ์ฃผ์–ด์ง€์ง€ ์•Š์€ ํ™˜๊ฒฝ์—์„œ๋„ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์˜ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด MUSDB18๊ณผ Slakh์„ ์ด์šฉํ•˜๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์ด ๋‹จ์ผ ๋„คํŠธ์›Œํฌ๋กœ ์—ฌ๋Ÿฌ ์†Œ์Šค๋ฅผ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๋˜ํ•œ, ์ž ์žฌ ๊ณต๊ฐ„์— ๋Œ€ํ•œ ๋ถ„์„์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์ด ์ž ์žฌ ๋ฒกํ„ฐ์˜ ๋ณด๊ฐ„์„ ํ†ตํ•ด ์—ฐ์†์ ์ธ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹คIn recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals. The proposed method consists of a Query-net and a Separator: given a query and a mixture, the Query-net encodes the query into the latent space, and the Separator estimates masks conditioned by the latent vector, which is then applied to the mixture for separation. The Separator can also generate masks using the latent vector from the training samples, allowing separation in the absence of a query. We evaluate our method on the MUSDB18 dataset and the Slakh dataset, and experimental results show that the proposed method can separate multiple sources with a single network. In addition, through further investigation of the latent space we demonstrate that our method can generate continuous outputs via latent vector interpolation.์ œ 1 ์žฅ ์„œ๋ก  5 1.1 ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ 5 1.2 ์—ฐ๊ตฌ ๋ชฉํ‘œ 8 ์ œ 2 ์žฅ ๋ฐฐ๊ฒฝ ์ด๋ก  ๋ฐ ๊ด€๋ จ ์—ฐ๊ตฌ 10 2.1 ๋ฐฐ๊ฒฝ ์ด๋ก  10 2.1.1 ์Œ์› ๋ถ„๋ฆฌ 10 2.1.2 Variational Autoencoder 11 2.2 ๊ด€๋ จ ์—ฐ๊ตฌ 14 2.2.1 ์Œ์› ๋ถ„๋ฆฌ ์—ฐ๊ตฌ 14 2.2.2 ๊ธฐํƒ€ ๋ถ„์•ผ ์—ฐ๊ตฌ 17 ์ œ 3 ์žฅ ์ œ์•ˆ ๊ธฐ๋ฒ• 20 3.1 ์˜ค๋””์˜ค ์ฟผ๋ฆฌ ๊ธฐ๋ฐ˜ ์Œ์› ๋ถ„๋ฆฌ 20 3.2 ํ•™์Šต 23 3.2.1 ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ 23 3.2.2 ํ•™์Šต ๋ชฉ์  24 3.3 ํ…Œ์ŠคํŠธ 26 ์ œ 4 ์žฅ ์‹คํ—˜ 28 4.1 ๋ฐ์ดํ„ฐ์…‹ 28 4.2 ์‹คํ—˜ ์ƒ์„ธ ์„ค์ • 30 4.3 ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์— ๋Œ€ํ•œ ์ฟผ๋ฆฌ ์ธ์ฝ”๋”ฉ ๋„คํŠธ์›Œํฌ ๋™์ž‘ 31 4.4 ์˜ค๋””์˜ค ์ฟผ๋ฆฌ๋ฅผ ์ด์šฉํ•œ ํŠน์ • ์•…๊ธฐ ๋ถ„๋ฆฌ 32 4.5 ์ž ์žฌ ๋ฒกํ„ฐ ๋ณด๊ฐ„์„ ์ด์šฉํ•œ ์Œ์› ๋ถ„๋ฆฌ 34 4.6 ์ž ์žฌ ๋ฒกํ„ฐ๊ฐ€ ์Œ์› ๋ถ„๋ฆฌ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ ๋ถ„์„ 35 4.7 ์„ธ๋ถ„ํ™”๋œ ํด๋ž˜์Šค ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ์Œ์› ๋ถ„๋ฆฌ ๋น„๊ต ์‹คํ—˜ 38 4.8 ๋ถ„๋ฆฌ ๋ฐ˜๋ณต๋ฒ• 40 4.9 ์ •๋Ÿ‰ ํ‰๊ฐ€ 43 ์ œ 5 ์žฅ ๊ฒฐ๋ก  46 5.1 ์—ฐ๊ตฌ ์˜์˜ 46 5.2 ํ–ฅํ›„ ์—ฐ๊ตฌ 47 ABSTRACT 56Maste

    Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates

    Full text link
    Music source separation is focused on extracting distinct sonic elements from composite tracks. Historically, many methods have been grounded in supervised learning, necessitating labeled data, which is occasionally constrained in its diversity. More recent methods have delved into N-shot techniques that utilize one or more audio samples to aid in the separation. However, a challenge with some of these methods is the necessity for an audio query during inference, making them less suited for genres with varied timbres and effects. This paper offers a proof-of-concept for a self-supervised music source separation system that eliminates the need for audio queries at inference time. In the training phase, while it adopts a query-based approach, we introduce a modification by substituting the continuous embedding of query audios with Vector Quantized (VQ) representations. Trained end-to-end with up to N classes as determined by the VQ's codebook size, the model seeks to effectively categorise instrument classes. During inference, the input is partitioned into N sources, with some potentially left unutilized based on the mix's instrument makeup. This methodology suggests an alternative avenue for considering source separation across diverse music genres. We provide examples and additional results online.Comment: 4 pages, 2 figures, 1 table; Accepted at the 37th Conference on Neural Information Processing Systems (2023), Machine Learning for Audio Worksho

    Gendering the Virtual Space: Sonic Femininities and Masculinities in Contemporary Top 40 Music

    Full text link
    This dissertation analyzes vocal placementโ€”the apparent location of a voice in the virtual space created by a recordingโ€”and its relationship to gender. When listening to a piece of recorded music through headphones or stereo speakers, one hears various sound sources as though they were located in a virtual space (Clarke 2013). For instance, a specific vocal performanceโ€”once manipulated by various technologies in a recording studioโ€”might evoke a concert hall, an intimate setting, or an otherworldly space. The placement of the voice within this space is one of the central musical parameters through which listeners ascribe cultural meanings to popular music. I develop an original methodology for analyzing vocal placement in recorded popular music. Combining close listening with music information retrieval tools, I precisely locate a voiceโ€™s placement in virtual space according to five parameters: (1) Width, (2) Pitch Height, (3) Prominence, (4) Environment, and (5) Layering. I use the methodology to conduct close and distant readings of vocal placement in twenty-first-century Anglo-American popular music. First, an analysis of โ€œLove the Way You Lieโ€ (2010), by Eminem feat. Rihanna, showcases how the methodology can be used to support close readings of individual songs. Through my analysis, I suggest that Rihannaโ€™s wide vocal placement evokes a nexus of conflicting emotions in the wake of domestic violence. Eminemโ€™s narrow placement, conversely, expresses anger, frustration, and violence. Second, I use the analytical methodology to conduct a larger-scale study of vocal placement in a corpus of 113 post-2008 Billboard chart-topping collaborations between two or more artists. By stepping away from close readings of individual songs, I show how gender stereotypes are engineered en masse in the popular music industry. I show that women artists are generally assigned vocal placements that are wider, more layered, and more reverberated than those of men. This vocal placement configurationโ€”exemplified in โ€œLove the Way You Lieโ€โ€”creates a sonic contrast that presents womenโ€™s voices as ornamental and diffuse, and menโ€™s voices as direct and relatable. I argue that these contrasting vocal placements sonically construct a gender binary, exemplifying one of the ways in which dichotomous conceptions of gender are reinforced through the sound of popular music
    corecore