134 research outputs found

    Effects of errorless learning on the acquisition of velopharyngeal movement control

    Get PDF
    Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

    Implementation of the Radiation Characteristics of Musical Instruments in Wave Field Synthesis Applications

    Get PDF
    In this thesis a method to implement the radiation characteristics of musical instruments in wave field synthesis systems is developed. It is applied and tested in two loudspeaker systems.Because the loudspeaker systems have a comparably low number of loudspeakers the wave field is synthesized at discrete listening positions by solving a linear equation system. Thus, for every constellation of listening and source position all loudspeakers can be used for the synthesis. The calculations are done in spectral domain, denying sound propagation velocity at first. This approach causes artefacts in the loudspeaker signals and synthesis errors in the listening area which are compensated by means of psychoacoustic methods. With these methods the aliasing frequency is determined by the extent of the listening area whereas in other wave field synthesis systems it is determined by the distance of adjacent loudspeakers. Musical instruments are simplified as complex point sources to gain, store and propagate their radiation characteristics. This method is the basis of the newly developed “Radiation Method” which improves the matrix conditioning of the equation system and the precision of the wave field synthesis by implementing the radiation characteristics of the driven loudspeakers. In this work, the “Minimum Energy Method” — originally developed for acoustic holography — is applied for matters of wave field synthesis for the first time. It guarantees a robust solution and creates softer loudspeaker driving signals than the Radiation Method but yields a worse approximation of the wave field beyond the discrete listening positions. Psychoacoustic considerations allow for a successfull wave field synthesis: Integration times of the auditory system determine the spatial dimensions in which the wave field synthesis approach works despite different arrival times and directions of wave fronts. By separating the spectrum into frequency bands of the critical band width, masking effects are utilized to reduce the amount of calculations with hardly audible consequances. By applying the “Precedence Fade”, the precedence effect is used to manipulate the perceived source position and improve the reproduction of initial transients of notes. Based on Auditory Scene Analysis principles, “Fading Based Panning” creates precise phantom source positions between the actual loudspeaker positions. Physical measurements, simulations and listening tests prove evidence for the introduced methods and reveal their precision. Furthermore, results of the listening tests show that the perceived spaciousness of instrumental sound not necessarily goes along with distinctness of localization. The introduced methods are compatible to conventional multi channel audio systems as well as other wave field synthesis applications.In dieser Arbeit wird eine Methode entwickelt, um die Abstrahlcharakteristik von Musikinstrumenten in Wellenfeldsynthesesystemen zu implementieren. Diese wird in zwei Lautsprechersystemen umgesetzt und getestet. Aufgrund der vergleichsweise geringen Anzahl an Lautsprechern wird das Schallfeld an diskreten Hörpositionen durch Lösung eines linearen Gleichungssystems resynthetisiert. Dadurch können für jede Konstellation aus Quellen- und Hörposition alle Lautsprecher für die Synthese verwendet werden. Hierzu wird zunächst in Frequenzebene, unter Vernachlässigung der Ausbreitungsgeschwindigkeit des Schalls gerechnet. Dieses Vorgehen sorgt für Artefakte im Schallsignal und Synthesefehler im Hörbereich, die durch psychoakustische Methoden kompensiert werden. Im Vergleich zu anderen Wellenfeldsyntheseverfahren wird bei diesem Vorgehen die Aliasingfrequenz durch die Größe des Hörbereichs und nicht durch den Lautsprecherabstand bestimmt. Musikinstrumente werden als komplexe Punktquellen vereinfacht, wodurch die Abstrahlung erfasst, gespeichert und in den Raum propagiert werden kann. Dieses Vorgehen ist auch die Basis der neu entwickelten “Radiation Method”, die durch Einbeziehung der Abstrahlcharakteristik der verwendeten Lautsprecher die Genauigkeit der Wellenfeldsynthese erhöht und die Konditionierung der Propagierungsmatrix des zu lösenden Gleichungssystems verbessert. In dieser Arbeit wird erstmals die für die akustische Holografie entwickelte “Minimum Energy Method” auf Wellenfeldsynthese angewandt. Sie garantiert eine robuste Lösung und erzeugt leisere Lautsprechersignale und somit mehr konstruktive Interferenz, approximiert das Schallfeld jenseits der diskreten Hörpositionen jedoch schlechter als die Radiation Method. Zahlreiche psychoakustische Überlegungen machen die Umsetzung der Wellenfeldsynthese möglich: Integrationszeiten des Gehörs bestimmen die räumlichen Dimensionen in der die Wellenfeldsynthesemethode — trotz der aus verschiedenen Richtungen und zu unterschiedlichen Zeitpunkten ankommenden Wellenfronten — funktioniert. Durch Teilung des Schallsignals in Frequenzbänder der kritischen Bandbreite wird unter Ausnutzung von Maskierungseffekten die Anzahl an nötigen Rechnungen mit kaum hörbaren Konsequenzen reduziert. Mit dem “Precedence Fade” wird der Präzedenzeffekt genutzt, um die wahrgenommene Schallquellenposition zu beeinflussen. Zudem wird dadurch die Reproduktion transienter Einschwingvorgänge verbessert. Auf Grundlage von Auditory Scene Analysis wird “Fading Based Panning” eingeführt, um darüber hinaus eine präzise Schallquellenlokalisation jenseits der Lautsprecherpositionen zu erzielen. Physikalische Messungen, Simulationen und Hörtests weisen nach, dass die neu eingeführten Methoden funktionieren und zeigen ihre Präzision auf. Auch zeigt sich, dass die wahrgenommene Räumlichkeit eines Instrumentenklangs nicht der Lokalisationssicherheit entspricht. Die eingeführten Methoden sind kompatibel mit konventionellen Mehrkanal-Audiosystemen sowie mit anderen Wellenfeldsynthesesystemen

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
    corecore