29 research outputs found
Effects of errorless learning on the acquisition of velopharyngeal movement control
Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio
Single-Microphone Speech Enhancement and Separation Using Deep Learning
The cocktail party problem comprises the challenging task of understanding a
speech signal in a complex acoustic environment, where multiple speakers and
background noise signals simultaneously interfere with the speech signal of
interest. A signal processing algorithm that can effectively increase the
speech intelligibility and quality of speech signals in such complicated
acoustic situations is highly desirable. Especially for applications involving
mobile communication devices and hearing assistive devices. Due to the
re-emergence of machine learning techniques, today, known as deep learning, the
challenges involved with such algorithms might be overcome. In this PhD thesis,
we study and develop deep learning-based techniques for two sub-disciplines of
the cocktail party problem: single-microphone speech enhancement and
single-microphone multi-talker speech separation. Specifically, we conduct
in-depth empirical analysis of the generalizability capability of modern deep
learning-based single-microphone speech enhancement algorithms. We show that
performance of such algorithms is closely linked to the training data, and good
generalizability can be achieved with carefully designed training data.
Furthermore, we propose uPIT, a deep learning-based algorithm for
single-microphone speech separation and we report state-of-the-art results on a
speaker-independent multi-talker speech separation task. Additionally, we show
that uPIT works well for joint speech separation and enhancement without
explicit prior knowledge about the noise type or number of speakers. Finally,
we show that deep learning-based speech enhancement algorithms designed to
minimize the classical short-time spectral amplitude mean squared error leads
to enhanced speech signals which are essentially optimal in terms of STOI, a
state-of-the-art speech intelligibility estimator.Comment: PhD Thesis. 233 page
Treatise on Hearing: The Temporal Auditory Imaging Theory Inspired by Optics and Communication
A new theory of mammalian hearing is presented, which accounts for the
auditory image in the midbrain (inferior colliculus) of objects in the
acoustical environment of the listener. It is shown that the ear is a temporal
imaging system that comprises three transformations of the envelope functions:
cochlear group-delay dispersion, cochlear time lensing, and neural group-delay
dispersion. These elements are analogous to the optical transformations in
vision of diffraction between the object and the eye, spatial lensing by the
lens, and second diffraction between the lens and the retina. Unlike the eye,
it is established that the human auditory system is naturally defocused, so
that coherent stimuli do not react to the defocus, whereas completely
incoherent stimuli are impacted by it and may be blurred by design. It is
argued that the auditory system can use this differential focusing to enhance
or degrade the images of real-world acoustical objects that are partially
coherent. The theory is founded on coherence and temporal imaging theories that
were adopted from optics. In addition to the imaging transformations, the
corresponding inverse-domain modulation transfer functions are derived and
interpreted with consideration to the nonuniform neural sampling operation of
the auditory nerve. These ideas are used to rigorously initiate the concepts of
sharpness and blur in auditory imaging, auditory aberrations, and auditory
depth of field. In parallel, ideas from communication theory are used to show
that the organ of Corti functions as a multichannel phase-locked loop (PLL)
that constitutes the point of entry for auditory phase locking and hence
conserves the signal coherence. It provides an anchor for a dual coherent and
noncoherent auditory detection in the auditory brain that culminates in
auditory accommodation. Implications on hearing impairments are discussed as
well.Comment: 603 pages, 131 figures, 13 tables, 1570 reference