200 research outputs found

    Directions for the future of technology in pronunciation research and teaching

    Get PDF
    This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2) pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider how these insights pave the way for researchers and developers working to create research-informed, computer-assisted pronunciation teaching resources. We conclude with predictions for future developments

    Interactive non-speech auditory display of multivariate data

    Get PDF

    Modeling and rendering for development of a virtual bone surgery system

    Get PDF
    A virtual bone surgery system is developed to provide the potential of a realistic, safe, and controllable environment for surgical education. It can be used for training in orthopedic surgery, as well as for planning and rehearsal of bone surgery procedures...Using the developed system, the user can perform virtual bone surgery by simultaneously seeing bone material removal through a graphic display device, feeling the force via a haptic deice, and hearing the sound of tool-bone interaction --Abstract, page iii

    Musical Cross Synthesis using Matrix Factorisation

    Get PDF
    The focus of this work is to explore a new method for the creative analysis and manipulation of musical audio content. Given a target song and a source song, the goal is reconstruct the harmonic and rhythmic structure of the target with the timbral components from the source, in such a way that so that both the target and the source material are recognizable by the listener. We refer to this operation as musical cross-synthesis. For this purpose, we propose the use of a Matrix Factorisation method, more specifically, Shift-Invariant Probabilistic Latent Component Analysis (PLCA). The input to the PLCA algorithm are beat synchronous CQT basis functions of the source whose temporal activations are used to approximate the CQT of the target. Using the shift invariant property of the PLCA allows each basis function to be subjected to a range of possible pitch shifts which increases the flexibility of the source to represent the target. To create the resulting musical cross-synthesis the beat synchronous, pitch-shifted CQT basis functions are inverted and concatenated in time

    Integrated Framework for Interaction and Annotation of Multimodal Data

    Get PDF
    Ahmed, Afroza. MS. The University of Memphis. August 2010. Integrated Framework for Interaction and Annotation of Multimodal Data. Major Professor: Mohammed Yeasin, Ph.D. This thesis aims to develop an integrated framework and intuitive user-interface to interact, annotate, and analyze multimodal data (i.e., video, image, audio, and text data). The proposed framework has three layers: (i) interaction, (ii) annotation, and (iii) analysis or modeling. These three layers are seamlessly wrapped together using an user-friendly interface designed based on proven principles from the industry practices. The key objective is to facilitate the interaction with multimodal data at various levels of granularities. In particular, the proposed framework allows interaction with the multimodal data in three levels: (i) raw level, (ii) feature level, and (iii) semantic level. The main function of the proposed framework is to provide an efficient way to annotate the raw multimodal data to create proper ground truth meta data. The annotated data is used for visual analysis, co-analysis, and modeling of underlying concepts, such as dialog acts, continuous gestures, and spontaneous emotions. The key challenge is to integrate codes(computer programs) written using different programming languages and platforms, displaying the results, and multimodal data in one platform. This fully integrated tool achieved the stated goals and objective and is a valuable addition to the list of very few existing tools that are useful for interaction, annotation, and analysis of multimodal data

    Spatial cues alone produce inaccurate sound segregation: The effect of interaural time differences

    Get PDF
    To clarify the role of spatial cues in sound segregation, this study explored whether interaural time differences (ITDs) are sufficient to allow listeners to identify a novel sound source from a mixture of sources. Listeners heard mixtures of two synthetic sounds, a target and distractor, each of which possessed naturalistic spectrotemporal correlations but otherwise lacked strong grouping cues, and which contained either the same or different ITDs. When the task was to judge whether a probe sound matched a source in the preceding mixture, performance improved greatly when the same target was presented repeatedly across distinct distractors, consistent with previous results. In contrast, performance improved only slightly with ITD separation of target and distractor, even when spectrotemporal overlap between target and distractor was reduced. However, when subjects localized, rather than identified, the sources in the mixture, sources with different ITDs were reported as two sources at distinct and accurately identified locations. ITDs alone thus enable listeners to perceptually segregate mixtures of sources, but the perceived content of these sources is inaccurate when other segregation cues, such as harmonicity and common onsets and offsets, do not also promote proper source separation

    Deep learning-based music source separation

    Get PDF
    Diese Dissertation befasst sich mit dem Problem der Trennung von Musikquellen durch den Einsatz von deep learning Methoden. Die auf deep learning basierende Trennung von Musikquellen wird unter drei Gesichtspunkten untersucht. Diese Perspektiven sind: die Signalverarbeitung, die neuronale Architektur und die Signaldarstellung. Aus der ersten Perspektive, soll verstanden werden, welche deep learning Modelle, die auf DNNs basieren, für die Aufgabe der Musikquellentrennung lernen, und ob es einen analogen Signalverarbeitungsoperator gibt, der die Funktionalität dieser Modelle charakterisiert. Zu diesem Zweck wird ein neuartiger Algorithmus vorgestellt. Der Algorithmus wird als NCA bezeichnet und destilliert ein optimiertes Trennungsmodell, das aus nicht-linearen Operatoren besteht, in einen einzigen linearen Operator, der leicht zu interpretieren ist. Aus der zweiten Perspektive, soll eine neuronale Netzarchitektur vorgeschlagen werden, die das zuvor erwähnte Konzept der Filterberechnung und -optimierung beinhaltet. Zu diesem Zweck wird die als Masker and Denoiser (MaD) bezeichnete neuronale Netzarchitektur vorgestellt. Die vorgeschlagene Architektur realisiert die Filteroperation unter Verwendung skip-filtering connections Verbindungen. Zusätzlich werden einige Inferenzstrategien und Optimierungsziele vorgeschlagen und diskutiert. Die Leistungsfähigkeit von MaD bei der Musikquellentrennung wird durch eine Reihe von Experimenten bewertet, die sowohl objektive als auch subjektive Bewertungsverfahren umfassen. Abschließend, der Schwerpunkt der dritten Perspektive liegt auf dem Einsatz von DNNs zum Erlernen von solchen Signaldarstellungen, für die Trennung von Musikquellen hilfreich sind. Zu diesem Zweck wird eine neue Methode vorgeschlagen. Die vorgeschlagene Methode verwendet ein neuartiges Umparametrisierungsschema und eine Kombination von Optimierungszielen. Die Umparametrisierung basiert sich auf sinusförmigen Funktionen, die interpretierbare DNN-Darstellungen fördern. Der durchgeführten Experimente deuten an, dass die vorgeschlagene Methode beim Erlernen interpretierbarer Darstellungen effizient eingesetzt werden kann, wobei der Filterprozess noch auf separate Musikquellen angewendet werden kann. Die Ergebnisse der durchgeführten Experimente deuten an, dass die vorgeschlagene Methode beim Erlernen interpretierbarer Darstellungen effizient eingesetzt werden kann, wobei der Filterprozess noch auf separate Musikquellen angewendet werden kann. Darüber hinaus der Einsatz von optimal transport (OT) Entfernungen als Optimierungsziele sind für die Berechnung additiver und klar strukturierter Signaldarstellungen.This thesis addresses the problem of music source separation using deep learning methods. The deep learning-based separation of music sources is examined from three angles. These angles are: the signal processing, the neural architecture, and the signal representation. From the first angle, it is aimed to understand what deep learning models, using deep neural networks (DNNs), learn for the task of music source separation, and if there is an analogous signal processing operator that characterizes the functionality of these models. To do so, a novel algorithm is presented. The algorithm, referred to as the neural couplings algorithm (NCA), distills an optimized separation model consisting of non-linear operators into a single linear operator that is easy to interpret. Using the NCA, it is shown that DNNs learn data-driven filters for singing voice separation, that can be assessed using signal processing. Moreover, by enabling DNNs to learn how to predict filters for source separation, DNNs capture the structure of the target source and learn robust filters. From the second angle, it is aimed to propose a neural network architecture that incorporates the aforementioned concept of filter prediction and optimization. For this purpose, the neural network architecture referred to as the Masker-and-Denoiser (MaD) is presented. The proposed architecture realizes the filtering operation using skip-filtering connections. Additionally, a few inference strategies and optimization objectives are proposed and discussed. The performance of MaD in music source separation is assessed by conducting a series of experiments that include both objective and subjective evaluation processes. Experimental results suggest that the MaD architecture, with some of the studied strategies, is applicable to realistic music recordings, and the MaD architecture has been considered one of the state-of-the-art approaches in the Signal Separation and Evaluation Campaign (SiSEC) 2018. Finally, the focus of the third angle is to employ DNNs for learning signal representations that are helpful for separating music sources. To that end, a new method is proposed using a novel re-parameterization scheme and a combination of optimization objectives. The re-parameterization is based on sinusoidal functions that promote interpretable DNN representations. Results from the conducted experimental procedure suggest that the proposed method can be efficiently employed in learning interpretable representations, where the filtering process can still be applied to separate music sources. Furthermore, the usage of optimal transport (OT) distances as optimization objectives is useful for computing additive and distinctly structured signal representations for various types of music sources
    corecore