8,744 research outputs found

    Multichannel Music Separation with Deep Neural Networks

    Get PDF
    International audienceThis article addresses the problem of multichannel music separation. We propose a framework where the source spectra are estimated using deep neural networks and combined with spatial covariance matrices to encode the source spatial characteristics. The parameters are estimated in an iterative expectation-maximization fashion and used to derive a multichannel Wiener filter. We evaluate the proposed framework for the task of music separation on a large dataset. Experimental results show that the method we describe performs consistently well in separating singing voice and other instruments from realistic musical mixtures

    Trennung und SchĂ€tzung der Anzahl von Audiosignalquellen mit Zeit- und FrequenzĂŒberlappung

    Get PDF
    Everyday audio recordings involve mixture signals: music contains a mixture of instruments; in a meeting or conference, there is a mixture of human voices. For these mixtures, automatically separating or estimating the number of sources is a challenging task. A common assumption when processing mixtures in the time-frequency domain is that sources are not fully overlapped. However, in this work we consider some cases where the overlap is severe — for instance, when instruments play the same note (unison) or when many people speak concurrently ("cocktail party") — highlighting the need for new representations and more powerful models. To address the problems of source separation and count estimation, we use conventional signal processing techniques as well as deep neural networks (DNN). We ïŹrst address the source separation problem for unison instrument mixtures, studying the distinct spectro-temporal modulations caused by vibrato. To exploit these modulations, we developed a method based on time warping, informed by an estimate of the fundamental frequency. For cases where such estimates are not available, we present an unsupervised model, inspired by the way humans group time-varying sources (common fate). This contribution comes with a novel representation that improves separation for overlapped and modulated sources on unison mixtures but also improves vocal and accompaniment separation when used as an input for a DNN model. Then, we focus on estimating the number of sources in a mixture, which is important for real-world scenarios. Our work on count estimation was motivated by a study on how humans can address this task, which lead us to conduct listening experiments, conïŹrming that humans are only able to estimate the number of up to four sources correctly. To answer the question of whether machines can perform similarly, we present a DNN architecture, trained to estimate the number of concurrent speakers. Our results show improvements compared to other methods, and the model even outperformed humans on the same task. In both the source separation and source count estimation tasks, the key contribution of this thesis is the concept of “modulation”, which is important to computationally mimic human performance. Our proposed Common Fate Transform is an adequate representation to disentangle overlapping signals for separation, and an inspection of our DNN count estimation model revealed that it proceeds to ïŹnd modulation-like intermediate features.Im Alltag sind wir von gemischten Signalen umgeben: Musik besteht aus einer Mischung von Instrumenten; in einem Meeting oder auf einer Konferenz sind wir einer Mischung menschlicher Stimmen ausgesetzt. FĂŒr diese Mischungen ist die automatische Quellentrennung oder die Bestimmung der Anzahl an Quellen eine anspruchsvolle Aufgabe. Eine hĂ€uïŹge Annahme bei der Verarbeitung von gemischten Signalen im Zeit-Frequenzbereich ist, dass die Quellen sich nicht vollstĂ€ndig ĂŒberlappen. In dieser Arbeit betrachten wir jedoch einige FĂ€lle, in denen die Überlappung immens ist zum Beispiel, wenn Instrumente den gleichen Ton spielen (unisono) oder wenn viele Menschen gleichzeitig sprechen (Cocktailparty) —, so dass neue Signal-ReprĂ€sentationen und leistungsfĂ€higere Modelle notwendig sind. Um die zwei genannten Probleme zu bewĂ€ltigen, verwenden wir sowohl konventionelle Signalverbeitungsmethoden als auch tiefgehende neuronale Netze (DNN). Wir gehen zunĂ€chst auf das Problem der Quellentrennung fĂŒr Unisono-Instrumentenmischungen ein und untersuchen die speziellen, durch Vibrato ausgelösten, zeitlich-spektralen Modulationen. Um diese Modulationen auszunutzen entwickelten wir eine Methode, die auf Zeitverzerrung basiert und eine SchĂ€tzung der Grundfrequenz als zusĂ€tzliche Information nutzt. FĂŒr FĂ€lle, in denen diese SchĂ€tzungen nicht verfĂŒgbar sind, stellen wir ein unĂŒberwachtes Modell vor, das inspiriert ist von der Art und Weise, wie Menschen zeitverĂ€nderliche Quellen gruppieren (Common Fate). Dieser Beitrag enthĂ€lt eine neuartige ReprĂ€sentation, die die Separierbarkeit fĂŒr ĂŒberlappte und modulierte Quellen in Unisono-Mischungen erhöht, aber auch die Trennung in Gesang und Begleitung verbessert, wenn sie in einem DNN-Modell verwendet wird. Im Weiteren beschĂ€ftigen wir uns mit der SchĂ€tzung der Anzahl von Quellen in einer Mischung, was fĂŒr reale Szenarien wichtig ist. Unsere Arbeit an der SchĂ€tzung der Anzahl war motiviert durch eine Studie, die zeigt, wie wir Menschen diese Aufgabe angehen. Dies hat uns dazu veranlasst, eigene Hörexperimente durchzufĂŒhren, die bestĂ€tigten, dass Menschen nur in der Lage sind, die Anzahl von bis zu vier Quellen korrekt abzuschĂ€tzen. Um nun die Frage zu beantworten, ob Maschinen dies Ă€hnlich gut können, stellen wir eine DNN-Architektur vor, die erlernt hat, die Anzahl der gleichzeitig sprechenden Sprecher zu ermitteln. Die Ergebnisse zeigen Verbesserungen im Vergleich zu anderen Methoden, aber vor allem auch im Vergleich zu menschlichen Hörern. Sowohl bei der Quellentrennung als auch bei der SchĂ€tzung der Anzahl an Quellen ist ein Kernbeitrag dieser Arbeit das Konzept der “Modulation”, welches wichtig ist, um die Strategien von Menschen mittels Computern nachzuahmen. Unsere vorgeschlagene Common Fate Transformation ist eine adĂ€quate Darstellung, um die Überlappung von Signalen fĂŒr die Trennung zugĂ€nglich zu machen und eine Inspektion unseres DNN-ZĂ€hlmodells ergab schließlich, dass sich auch hier modulationsĂ€hnliche Merkmale ïŹnden lassen

    Exploring distance learning experiences of in-service music teachers from Puerto Rico in a master's program

    Full text link
    Thesis (D.M.A.)--Boston UniversityThe purpose of this study was to explore the experiences of in-service music teachers who chose to pursue a master's degree in music education through distance learning. In this study, I examined the motivations of in-service music teachers for choosing to pursue a master's degree in music education through distance learning; the benefits teachers reported as a result of emolling in a distance learning program; the challenges teachers faced when studying in an online distance learning graduate program; and, the learning experiences teachers found significant for their profession and teaching environments. Teachers who pursued a master's degree in music education through distance learning at Cambridge College Puerto Rico Regional Center comprised the sample. The primary data collection method was individual semi-structured interviews. Results depicted that the experiences gained by in-service music teachers increased their capacity in teaching pedagogy, theoretical understanding of the field, communication skills, and capability in handling technological issues. The difference between the number of students satisfied and dissatisfied with the program was significant, with the former outnumbering the latter. The salient disadvantages reported by the sample group included a technological gap, reduced direct interaction with professors, a need for self-motivation, and a reduced practical ability between the moderators and the students. On the other hand, the primary advantage of distance learning was the convenience and flexibility of pursuing a music education degree online, which allowed the in-service music teachers to study at home and gave them the capability to balance their domestic and professional responsibilities. The participants' main reasons for enrolling in an online degree program were a desire to excel in their careers, the lack of a geographically closer option, professional and/or family lifestyles, a need for increasing academic knowledge, and a need to improve teaching capability and capacity. Recommendations are offered for leaders and institutions engaged in distance learning programs to address the challenges raised by students who have gone through the system. I hope that the knowledge gained from this study will expand both scholars' and prospective students' current understanding of distance learning as an educational model, especially in the music education field

    Running Away from White America : Challenging Patriarchal Masculinities through Childish Gambino's Music Video "This Is America"

    Get PDF
    This dissertation analyses masculinity in Childish Gambino's highly successful music video "This Is America" (2018) and in three of its multiple covers. Consequently, it focuses on the challenge against patriarchal masculinities that this music video has initiated on an international level. In order to accomplish this goal, the music video and its impact have been explored taking the field of Cultural Studies as a basis. Therefore, this dissertation attempts to decipher the symbolism of the videos drawing attention to crucial concepts such as identity, race and gender.Este trabajo analiza la masculinidad en el exitoso videoclip de Childish Gambino "This Is America" (2018) y en tres de sus mĂșltiples versiones. Por consiguiente, se centra en la oposiciĂłn contra las masculinidades patriarcales que este videoclip ha iniciado a nivel internacional. Para conseguir este propĂłsito, el videoclip y su impacto han sido explorados tomando el campo de Estudios Culturales como base. AsĂ­, este trabajo pretende descifrar el simbolismo de los videos prestando atenciĂłn a conceptos tan cruciales como identidad, raza y gĂ©nero.En aquest treball s'analitza la masculinitat en l'exitĂłs videoclip de Childish Gambino "This is America" (2018) i en tres de les seves mĂșltiples versions. Per tant, es centra en l'oposiciĂł contra les masculinitats patriarcals que aquest videoclip ha iniciat a nivell internacional. Per a aconseguir aquest propĂČsit, el videoclip i el seu impacte han sigut explorats prenent el camp d'Estudis Culturals com a base. AixĂ­, aquest treball pretĂ©n desxifrar el simbolisme dels vĂ­deos prestant atenciĂł a conceptes tan crucials com identitat, raça i gĂšnere

    Deep neural network based multichannel audio source separation

    Get PDF
    International audienceThis chapter presents a multichannel audio source separation framework where deep neural networks (DNNs) are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information. The parameters are estimated in an iterative expectation-maximization (EM) fashion and used to derive a multichannel Wiener filter. Different design choices and their impact on the performance are discussed. They include the cost functions for DNN training, the number of parameter updates, the use of multiple DNNs, and the use of weighted parameter updates. Finally, we present its application to a speech enhancement task and a music separation task. The experimental results show the benefit of the multichannel DNN-based approach over a single-channel DNN-based approach and the multichannel nonnegative matrix factorization based iterative EM framework

    A Weighted Individual Performance-Based Assessment for Middle School Orchestral Strings: Establishing Validity and Reliability

    Get PDF
    The study established the validity and reliability of a weighted individual performance-based assessment tool within the utility scope of middle school orchestral strings. The following research questions guided this study: 1. What specific string-playing behaviors and corresponding criteria validate a weighted individual performance-based assessment tool for middle school orchestral strings? 2. What are the psychometric properties of the weighted individual performance-based assessment tool in authentic situations? For Research Question 1, the expert panel and I were able to 100% mutually agree on 10 string-playing behaviors: tempo, rhythm, tone, pitch, intonation, technique, bowing, dynamics, phrasing, and posture that created the DISAT. Being interdependent, these string-playing behaviors are relevant because they encompass every necessary facet of orchestral string performance (Zdzinski & Barnes, 2002). According to Zdzinski and Barnes (2002), an orchestral string performance assessment must evaluate each facet of a participant’s playing ability to rate the overall musicianship. Bergee and Rossin (2019) stated in their research that it is important to have various aspects of a performance utilized in a musical assessment. The DISAT obtained reliability of 0.872 by having enough variance between raters in the authentic situation. Linacre (2015) stated that reliability greater than 0.8 is acceptable to v distinguish separation between raters. Combined with the expert panel\u27s 100% mutual agreement on content validity, this proved the DISAT to be a valid and reliable assessment tool for individual performance-based orchestral strings assessment (AERA, APA, & NCME, 2014). The DISAT can be utilized by districts and middle school orchestral string music teachers in North Carolina. Being a consistent, objective tool, the DISAT can standardize our approach to middle school orchestral string music education assessment (AERA, APA, & NCME, 2014). The data collected by the DISAT could easily track the musical progression of students while giving opportunities for constructive, purposeful feedback
    • 

    corecore