Search CORE

8,744 research outputs found

Multichannel Music Separation with Deep Neural Networks

Author: Liutkus Antoine
Nugraha Aditya Arie
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 29/08/2016
Field of study

International audienceThis article addresses the problem of multichannel music separation. We propose a framework where the source spectra are estimated using deep neural networks and combined with spatial covariance matrices to encode the source spatial characteristics. The parameters are estimated in an iterative expectation-maximization fashion and used to derive a multichannel Wiener filter. We evaluate the proposed framework for the task of music separation on a large dataset. Experimental results show that the method we describe performs consistently well in separating singing voice and other instruments from realistic musical mixtures

Crossref

INRIA a CCSD electronic archive server

Trennung und Schätzung der Anzahl von Audiosignalquellen mit Zeit- und Frequenzüberlappung

Author: Stöter Fabian-Robert
Publication venue
Publication date: 01/01/2020
Field of study

Everyday audio recordings involve mixture signals: music contains a mixture of instruments; in a meeting or conference, there is a mixture of human voices. For these mixtures, automatically separating or estimating the number of sources is a challenging task. A common assumption when processing mixtures in the time-frequency domain is that sources are not fully overlapped. However, in this work we consider some cases where the overlap is severe — for instance, when instruments play the same note (unison) or when many people speak concurrently ("cocktail party") — highlighting the need for new representations and more powerful models. To address the problems of source separation and count estimation, we use conventional signal processing techniques as well as deep neural networks (DNN). We ﬁrst address the source separation problem for unison instrument mixtures, studying the distinct spectro-temporal modulations caused by vibrato. To exploit these modulations, we developed a method based on time warping, informed by an estimate of the fundamental frequency. For cases where such estimates are not available, we present an unsupervised model, inspired by the way humans group time-varying sources (common fate). This contribution comes with a novel representation that improves separation for overlapped and modulated sources on unison mixtures but also improves vocal and accompaniment separation when used as an input for a DNN model. Then, we focus on estimating the number of sources in a mixture, which is important for real-world scenarios. Our work on count estimation was motivated by a study on how humans can address this task, which lead us to conduct listening experiments, conﬁrming that humans are only able to estimate the number of up to four sources correctly. To answer the question of whether machines can perform similarly, we present a DNN architecture, trained to estimate the number of concurrent speakers. Our results show improvements compared to other methods, and the model even outperformed humans on the same task. In both the source separation and source count estimation tasks, the key contribution of this thesis is the concept of “modulation”, which is important to computationally mimic human performance. Our proposed Common Fate Transform is an adequate representation to disentangle overlapping signals for separation, and an inspection of our DNN count estimation model revealed that it proceeds to ﬁnd modulation-like intermediate features.Im Alltag sind wir von gemischten Signalen umgeben: Musik besteht aus einer Mischung von Instrumenten; in einem Meeting oder auf einer Konferenz sind wir einer Mischung menschlicher Stimmen ausgesetzt. Für diese Mischungen ist die automatische Quellentrennung oder die Bestimmung der Anzahl an Quellen eine anspruchsvolle Aufgabe. Eine häuﬁge Annahme bei der Verarbeitung von gemischten Signalen im Zeit-Frequenzbereich ist, dass die Quellen sich nicht vollständig überlappen. In dieser Arbeit betrachten wir jedoch einige Fälle, in denen die Überlappung immens ist zum Beispiel, wenn Instrumente den gleichen Ton spielen (unisono) oder wenn viele Menschen gleichzeitig sprechen (Cocktailparty) —, so dass neue Signal-Repräsentationen und leistungsfähigere Modelle notwendig sind. Um die zwei genannten Probleme zu bewältigen, verwenden wir sowohl konventionelle Signalverbeitungsmethoden als auch tiefgehende neuronale Netze (DNN). Wir gehen zunächst auf das Problem der Quellentrennung für Unisono-Instrumentenmischungen ein und untersuchen die speziellen, durch Vibrato ausgelösten, zeitlich-spektralen Modulationen. Um diese Modulationen auszunutzen entwickelten wir eine Methode, die auf Zeitverzerrung basiert und eine Schätzung der Grundfrequenz als zusätzliche Information nutzt. Für Fälle, in denen diese Schätzungen nicht verfügbar sind, stellen wir ein unüberwachtes Modell vor, das inspiriert ist von der Art und Weise, wie Menschen zeitveränderliche Quellen gruppieren (Common Fate). Dieser Beitrag enthält eine neuartige Repräsentation, die die Separierbarkeit für überlappte und modulierte Quellen in Unisono-Mischungen erhöht, aber auch die Trennung in Gesang und Begleitung verbessert, wenn sie in einem DNN-Modell verwendet wird. Im Weiteren beschäftigen wir uns mit der Schätzung der Anzahl von Quellen in einer Mischung, was für reale Szenarien wichtig ist. Unsere Arbeit an der Schätzung der Anzahl war motiviert durch eine Studie, die zeigt, wie wir Menschen diese Aufgabe angehen. Dies hat uns dazu veranlasst, eigene Hörexperimente durchzuführen, die bestätigten, dass Menschen nur in der Lage sind, die Anzahl von bis zu vier Quellen korrekt abzuschätzen. Um nun die Frage zu beantworten, ob Maschinen dies ähnlich gut können, stellen wir eine DNN-Architektur vor, die erlernt hat, die Anzahl der gleichzeitig sprechenden Sprecher zu ermitteln. Die Ergebnisse zeigen Verbesserungen im Vergleich zu anderen Methoden, aber vor allem auch im Vergleich zu menschlichen Hörern. Sowohl bei der Quellentrennung als auch bei der Schätzung der Anzahl an Quellen ist ein Kernbeitrag dieser Arbeit das Konzept der “Modulation”, welches wichtig ist, um die Strategien von Menschen mittels Computern nachzuahmen. Unsere vorgeschlagene Common Fate Transformation ist eine adäquate Darstellung, um die Überlappung von Signalen für die Trennung zugänglich zu machen und eine Inspektion unseres DNN-Zählmodells ergab schließlich, dass sich auch hier modulationsähnliche Merkmale ﬁnden lassen

Exploring distance learning experiences of in-service music teachers from Puerto Rico in a master's program

Author: Vega-Martinez Juan Carlos
Publication venue: Boston University
Publication date: 01/01/2013
Field of study

Thesis (D.M.A.)--Boston UniversityThe purpose of this study was to explore the experiences of in-service music teachers who chose to pursue a master's degree in music education through distance learning. In this study, I examined the motivations of in-service music teachers for choosing to pursue a master's degree in music education through distance learning; the benefits teachers reported as a result of emolling in a distance learning program; the challenges teachers faced when studying in an online distance learning graduate program; and, the learning experiences teachers found significant for their profession and teaching environments. Teachers who pursued a master's degree in music education through distance learning at Cambridge College Puerto Rico Regional Center comprised the sample. The primary data collection method was individual semi-structured interviews. Results depicted that the experiences gained by in-service music teachers increased their capacity in teaching pedagogy, theoretical understanding of the field, communication skills, and capability in handling technological issues. The difference between the number of students satisfied and dissatisfied with the program was significant, with the former outnumbering the latter. The salient disadvantages reported by the sample group included a technological gap, reduced direct interaction with professors, a need for self-motivation, and a reduced practical ability between the moderators and the students. On the other hand, the primary advantage of distance learning was the convenience and flexibility of pursuing a music education degree online, which allowed the in-service music teachers to study at home and gave them the capability to balance their domestic and professional responsibilities. The participants' main reasons for enrolling in an online degree program were a desire to excel in their careers, the lack of a geographically closer option, professional and/or family lifestyles, a need for increasing academic knowledge, and a need to improve teaching capability and capacity. Recommendations are offered for leaders and institutions engaged in distance learning programs to address the challenges raised by students who have gone through the system. I hope that the knowledge gained from this study will expand both scholars' and prospective students' current understanding of distance learning as an educational model, especially in the music education field

Boston University Institutional Repository (OpenBU)

Running Away from White America : Challenging Patriarchal Masculinities through Childish Gambino's Music Video "This Is America"

Author: Delgado López Andrea
Universitat Autònoma de Barcelona. Departament de Filologia Anglesa i de Germanística
Universitat Autònoma de Barcelona. Facultat de Filosofia i Lletres
Publication venue
Publication date: 01/01/2021
Field of study

This dissertation analyses masculinity in Childish Gambino's highly successful music video "This Is America" (2018) and in three of its multiple covers. Consequently, it focuses on the challenge against patriarchal masculinities that this music video has initiated on an international level. In order to accomplish this goal, the music video and its impact have been explored taking the field of Cultural Studies as a basis. Therefore, this dissertation attempts to decipher the symbolism of the videos drawing attention to crucial concepts such as identity, race and gender.Este trabajo analiza la masculinidad en el exitoso videoclip de Childish Gambino "This Is America" (2018) y en tres de sus múltiples versiones. Por consiguiente, se centra en la oposición contra las masculinidades patriarcales que este videoclip ha iniciado a nivel internacional. Para conseguir este propósito, el videoclip y su impacto han sido explorados tomando el campo de Estudios Culturales como base. Así, este trabajo pretende descifrar el simbolismo de los videos prestando atención a conceptos tan cruciales como identidad, raza y género.En aquest treball s'analitza la masculinitat en l'exitós videoclip de Childish Gambino "This is America" (2018) i en tres de les seves múltiples versions. Per tant, es centra en l'oposició contra les masculinitats patriarcals que aquest videoclip ha iniciat a nivell internacional. Per a aconseguir aquest propòsit, el videoclip i el seu impacte han sigut explorats prenent el camp d'Estudis Culturals com a base. Així, aquest treball pretén desxifrar el simbolisme dels vídeos prestant atenció a conceptes tan crucials com identitat, raça i gènere

Diposit Digital de Documents de la UAB

Deep neural network based multichannel audio source separation

Author: Liutkus Antoine
Nugraha Aditya Arie
Vincent Emmanuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

International audienceThis chapter presents a multichannel audio source separation framework where deep neural networks (DNNs) are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information. The parameters are estimated in an iterative expectation-maximization (EM) fashion and used to derive a multichannel Wiener filter. Different design choices and their impact on the performance are discussed. They include the cost functions for DNN training, the number of parameter updates, the use of multiple DNNs, and the use of weighted parameter updates. Finally, we present its application to a speech enhancement task and a music separation task. The experimental results show the benefit of the multichannel DNN-based approach over a single-channel DNN-based approach and the multichannel nonnegative matrix factorization based iterative EM framework

INRIA a CCSD electronic archive server

A Weighted Individual Performance-Based Assessment for Middle School Orchestral Strings: Establishing Validity and Reliability

Author: Ward Kevin
Publication venue: Digital Commons @ Gardner-Webb University
Publication date: 01/01/2022
Field of study

The study established the validity and reliability of a weighted individual performance-based assessment tool within the utility scope of middle school orchestral strings. The following research questions guided this study: 1. What specific string-playing behaviors and corresponding criteria validate a weighted individual performance-based assessment tool for middle school orchestral strings? 2. What are the psychometric properties of the weighted individual performance-based assessment tool in authentic situations? For Research Question 1, the expert panel and I were able to 100% mutually agree on 10 string-playing behaviors: tempo, rhythm, tone, pitch, intonation, technique, bowing, dynamics, phrasing, and posture that created the DISAT. Being interdependent, these string-playing behaviors are relevant because they encompass every necessary facet of orchestral string performance (Zdzinski & Barnes, 2002). According to Zdzinski and Barnes (2002), an orchestral string performance assessment must evaluate each facet of a participant’s playing ability to rate the overall musicianship. Bergee and Rossin (2019) stated in their research that it is important to have various aspects of a performance utilized in a musical assessment. The DISAT obtained reliability of 0.872 by having enough variance between raters in the authentic situation. Linacre (2015) stated that reliability greater than 0.8 is acceptable to v distinguish separation between raters. Combined with the expert panel\u27s 100% mutual agreement on content validity, this proved the DISAT to be a valid and reliable assessment tool for individual performance-based orchestral strings assessment (AERA, APA, & NCME, 2014). The DISAT can be utilized by districts and middle school orchestral string music teachers in North Carolina. Being a consistent, objective tool, the DISAT can standardize our approach to middle school orchestral string music education assessment (AERA, APA, & NCME, 2014). The data collected by the DISAT could easily track the musical progression of students while giving opportunities for constructive, purposeful feedback

Digital Commons @ Gardner-Webb University