924 research outputs found

    Transfer Effect of Speech-sound Learning on Auditory-motor Processing of Perceived Vocal Pitch Errors

    Get PDF
    Speech perception and production are intimately linked. There is evidence that speech motor learning results in changes to auditory processing of speech. Whether speech motor control benefits from perceptual learning in speech, however, remains unclear. This event-related potential study investigated whether speech-sound learning can modulate the processing of feedback errors during vocal pitch regulation. Mandarin speakers were trained to perceive five Thai lexical tones while learning to associate pictures with spoken words over 5 days. Before and after training, participants produced sustained vowel sounds while they heard their vocal pitch feedback unexpectedly perturbed. As compared to the pre-training session, the magnitude of vocal compensation significantly decreased for the control group, but remained consistent for the trained group at the post-training session. However, the trained group had smaller and faster N1 responses to pitch perturbations and exhibited enhanced P2 responses that correlated significantly with their learning performance. These findings indicate that the cortical processing of vocal pitch regulation can be shaped by learning new speech-sound associations, suggesting that perceptual learning in speech can produce transfer effects to facilitating the neural mechanisms underlying the online monitoring of auditory feedback regarding vocal production

    Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

    Get PDF
    Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives

    Development of Kinematic Templates for Automatic Pronunciation Assessment Using Acoustic-to-Articulatory Inversion

    Get PDF
    Computer-aided pronunciation training (CAPT) is a subcategory of computer-aided language learning (CALL) that deals with the correction of mispronunciation during language learning. For a CAPT system to be effective, it must provide useful and informative feedback that is comprehensive, qualitative, quantitative, and corrective. While the majority of modern systems address the first 3 aspects of feedback, most of these systems do not provide corrective feedback. As part of the National Science Foundation (NSF) funded study “RI: Small: Speaker Independent Acoustic-Articulator Inversion for Pronunciation Assessment”, the Marquette Speech and Swallowing Lab and Marquette Speech and Signal Processing Lab are conducting a pilot study on the feasibility of the use of acoustic-to-articulatory inversion for CAPT. In order to evaluate the results of a speaker’s acoustic-to-articulatory inversion to determine pronunciation accuracy, kinematic templates are required. The templates would represent the vowels, consonant clusters, and stress characteristics of a typical American English (AE) speaker in the midsagittal plane. The Marquette University electromagnetic articulography Mandarin-accented English (EMA-MAE) database, which contains acoustic and kinematic speech data for 40 speakers (20 of which are native AE speakers), provides the data used to form the kinematic templates. The objective of this work is the development and implementation of these templates. The data provided in the EMA-MAE database is analyzed in detail, and the information obtained from the analysis is used to develop the kinematic templates. The vowel templates are designed as sets of concentric confidence ellipses, which specify (in the midsagittal plane) the ranges of tongue and lip positions corresponding to correct pronunciation. These ranges were defined using the typical articulator positioning of all English speakers of the EMA-MAE database. The data from these English speakers were also used to model the magnitude, speed history, movement pattern, and duration (MSTD) features of each consonant cluster in the EMA-MAE corpus. Cluster templates were designed as set of average MSTD parameters across English speakers for each cluster. Finally, English stress characteristics were similarly modeled as a set of average magnitude, speed, and duration parameters across English speakers. The kinematic templates developed in this work, while still in early stages, form the groundwork for assessment of features returned by the acoustic-to-articulatory inversion system. This in turn allows for assessment of articulatory inversion as a pronunciation training tool

    Towards Optimal Testing of Auditory Memory : Methodological development of recording of the mismatch negativity (MMN) of the auditory event-related potential (ERP)

    Get PDF
    The overlapping sound pressure waves that enter our brain via the ears and auditory nerves must be organized into a coherent percept. Modelling the regularities of the auditory environment and detecting unexpected changes in these regularities, even in the absence of attention, is a necessary prerequisite for orientating towards significant information as well as speech perception and communication, for instance. The processing of auditory information, in particular the detection of changes in the regularities of the auditory input, gives rise to neural activity in the brain that is seen as a mismatch negativity (MMN) response of the event-related potential (ERP) recorded by electroencephalography (EEG). --- As the recording of MMN requires neither a subject s behavioural response nor attention towards the sounds, it can be done even with subjects with problems in communicating or difficulties in performing a discrimination task, for example, from aphasic and comatose patients, newborns, and even fetuses. Thus with MMN one can follow the evolution of central auditory processing from the very early, often critical stages of development, and also in subjects who cannot be examined with the more traditional behavioural measures of auditory discrimination. Indeed, recent studies show that central auditory processing, as indicated by MMN, is affected in different clinical populations, such as schizophrenics, as well as during normal aging and abnormal childhood development. Moreover, the processing of auditory information can be selectively impaired for certain auditory attributes (e.g., sound duration, frequency) and can also depend on the context of the sound changes (e.g., speech or non-speech). Although its advantages over behavioral measures are undeniable, a major obstacle to the larger-scale routine use of the MMN method, especially in clinical settings, is the relatively long duration of its measurement. Typically, approximately 15 minutes of recording time is needed for measuring the MMN for a single auditory attribute. Recording a complete central auditory processing profile consisting of several auditory attributes would thus require from one hour to several hours. In this research, I have contributed to the development of new fast multi-attribute MMN recording paradigms in which several types and magnitudes of sound changes are presented in both speech and non-speech contexts in order to obtain a comprehensive profile of auditory sensory memory and discrimination accuracy in a short measurement time (altogether approximately 15 min for 5 auditory attributes). The speed of the paradigms makes them highly attractive for clinical research, their reliability brings fidelity to longitudinal studies, and the language context is especially suitable for studies on language impairments such as dyslexia and aphasia. In addition I have presented an even more ecological paradigm, and more importantly, an interesting result in view of the theory of MMN where the MMN responses are recorded entirely without a repetitive standard tone. All in all, these paradigms contribute to the development of the theory of auditory perception, and increase the feasibility of MMN recordings in both basic and clinical research. Moreover, they have already proven useful in studying for instance dyslexia, Asperger syndrome and schizophrenia.Tarkoituksenmukainen ääniympäristössä toimiminen, kuten ääniympäristön merkityksellisiin tapahtumiin suuntautuminen ja kielellinen kommunikointi edellyttävät ääniympäristön säännömukaisuuksien, ja näistä poikkeavien tapahtumien tarkkaavuudesta riippumatonta mallintamista ja jäsentämistä yhtenäiseksi havaintokokonaisuudeksi. Tällaisen esitietoisen kuuloinformaation käsittelyn, erityisesti ääniympäristöstä poikkeavien äänien havaitsemisesta syntyvä hermosolujen aktivoituminen näkyy aivosähkökäyrässä tapahtumasidonnaisena MMN-jännitevasteena. --- Koska MMN:n rekisteröiminen ei edellytä tutkittavalta tehtävän tekemistä tai ärsykkeiden aktiivista kuuntelemista, sen avulla voidaan tutkia sensorisen kuulomuistin toimintaa jo vauvaiästä vanhuuteen saakka. Perustutkimuksen lisäksi MMN:ää voidaan hyödyntää erilaisten aivoperäisten ja aivoihin vaikuttavien sairauksien ja tilojen, kuten lukihäiriön, ikääntymisen ja skitsofrenian tutkimuksessa. Viimeaikaiset tutkimukset osoittavatkin, että kuuloinformaation prosessointi MMN:llä tutkittuna on poikkeavaa erilaisissa aivosairauksissa kuten skitsofreniassa, mutta muuttuu myös kehityksen ja normaalin ikääntymisen myötä. Edelleen on osoitettu, että nämä kuuloinformaation prosessoinnin muutokset voivat ilmetä valikoivasti joillekin äänen piirteille (esim. äänen kesto tai taajuus) sekä vain joissakin yhteyksissä (esim. vain puheäänissä). Vaikka MMN-tutkimuksella onkin huomattavia etuja verrattuna behavioraalisiin menetelmiin, sen yleistymistä laajempaan käyttöön, erityisesti kliiniseen tutkimukseen ja diagnostiikkaan, jarruttaa MMN-rekisteröinnin suhteellinen hitaus. Tavallisesti MMN-rekisteröinti yhdelle äänen piirteelle vaatii n. 15 minuuttia, joten useamman äänen piirteen erottelun profiilin rekisteröiminen vie helposti tunnista useaankin tuntiin. Tässä väitöskirjatutkimuksessa tavoitteena oli kehittää MMN-rekisteröinnissä käytettävää koeasetelmaa siten, että rekisteröinti voitaisiin tehdä aiempaa nopeammin, mutta yhtä luotettavasti. Väitöskirjatutkimuksessa kehitettiin koeasetelmia, joilla voidaan rekisteröidä lyhyessä ajassa (noin 15 minuuttia viidelle eri äänen piirteelle) useiden äänten piirteiden ja erikokoisten äänimuutosten prosessoinnin profiilit sekä puheäänille että ei-puheäänille. Koska näillä koeasetelmilla saadaan tietoa kuuloinformaation prosessoinnista huomattavasti aikaisempaa lyhyemmässä ajassa, ne parantavat MMN:n käytettävyyttä erityisesti kliinisissä tutkimuksissa. Edelleen, lyhyt rekisteröintiaika mahdollistaa entistä kattavamman ja monipuolisemman kuvan muodostamisen tutkittavien erottelukyystä eri äänen piirteiden välillä. Korkea reliabiliteetti puolestaan tuo luotettavuutta erityisesti pitkittäistutkimuksiin ja puhekonteksti soveltuu erityisesti kielen ja sen häiriöiden kuten dysfasian ja afasian tutkimukseen. Kehitimme myös vielä näitäkin taloudellisemman koeasetelman, jossa MMN vaste rekisteröitiin uudella tavalla, ilman toistuvaa ääntä ja tämän osatutkimuksen tulos on merkittävä myös MMN:n ja kuuloinformaation prosessoinnin teorian kannalta. Kaiken kaikkiaan nämä väitöskirjatyössä kehitetyt koeasetelmat tuovat uutta tietoa kuuloinformaation käsittelystä, ja parantavat huomattavasti MMN-menetelmän käytettävyyttä sekä perus- että kliinisessä tutkimuksessa. On myös huomionarvoista, että näiden koeasetelmien on jo osoitettu olevan hyödyllisiä mm. lukihäiriön, Aspergerin syndrooman ja skitsofrenian tutkimuksessa

    Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection

    Get PDF
    Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased "breathiness". Modelling and surrogate data studies have shown significant nonlinear and non-Gaussian random properties in these sounds. Nonetheless, existing tools are limited to analysing voices displaying near periodicity, and do not account for this inherent biophysical nonlinearity and non-Gaussian randomness, often using linear signal processing methods insensitive to these properties. They do not directly measure the two main biophysical symptoms of disorder: complex nonlinear aperiodicity, and turbulent, aeroacoustic, non-Gaussian randomness. Often these tools cannot be applied to more severe disordered voices, limiting their clinical usefulness.

Methods: This paper introduces two new tools to speech analysis: recurrence and fractal scaling, which overcome the range limitations of existing tools by addressing directly these two symptoms of disorder, together reproducing a "hoarseness" diagram. A simple bootstrapped classifier then uses these two features to distinguish normal from disordered voices.

Results: On a large database of subjects with a wide variety of voice disorders, these new techniques can distinguish normal from disordered cases, using quadratic discriminant analysis, to overall correct classification performance of 91.8% plus or minus 2.0%. The true positive classification performance is 95.4% plus or minus 3.2%, and the true negative performance is 91.5% plus or minus 2.3% (95% confidence). This is shown to outperform all combinations of the most popular classical tools.

Conclusions: Given the very large number of arbitrary parameters and computational complexity of existing techniques, these new techniques are far simpler and yet achieve clinically useful classification performance using only a basic classification technique. They do so by exploiting the inherent nonlinearity and turbulent randomness in disordered voice signals. They are widely applicable to the whole range of disordered voice phenomena by design. These new measures could therefore be used for a variety of practical clinical purposes.
&#xa

    Cortical and subcortical speech-evoked responses in young and older adults: Effects of background noise, arousal states, and neural excitability

    Get PDF
    This thesis investigated how the brain processes speech signals in human adults across a wide age-range in the sensory auditory systems using electroencephalography (EEG). Two types of speech-evoked phase-locked responses were focused on: (i) cortical responses (theta-band phase-locked responses) that reflect processing of low-frequency slowly-varying envelopes of speech; (ii) subcortical/peripheral responses (frequency-following responses; FFRs) that reflect encoding of speech periodicity and temporal fine structure information. The aims are to elucidate how these neural activities are affected by different internal (aging, hearing loss, level of arousal and neural excitability) and external (background noise) factors during our daily life through three studies. Study 1 investigated theta-band phase-locking and FFRs in noisy environments in young and older adults. It investigated how aging and hearing loss affect these activities under quiet and noisy environments, and how these activities are associated with speech-in-noise perception. The results showed that ageing and hearing loss affect speech-evoked phase-locked responses through different mechanisms, and the effects of aging on cortical and subcortical activities take different roles in speech-in-noise perception. Study 2 investigated how level of arousal, or consciousness, affects phase-locked responses in young and older adults. The results showed that both theta-band phase-locking and FFRs decreases following decreases in the level of arousal. It was further found that neuro-regulatory role of sleep spindles on theta-band phase-locking is distinct between young and older adults, indicating that the mechanisms of neuro-regulation for phase-locked responses in different arousal states are age-dependent. Study 3 established a causal relationship between the auditory cortical excitability and FFRs using combined transcranial direct current stimulation (tDCS) and EEG. FFRs were measured before and after tDCS was applied over the auditory cortices. The results showed that changes in neural excitability of the right auditory cortex can alter FFR magnitudes along the contralateral pathway. This shows important theoretical and clinical implications that causally link functions of auditory cortex with neural encoding of speech periodicity. Taken together, findings of this thesis will advance our understanding of how speech signals are processed via neural phase-locking in our everyday life across the lifespan

    Temporal Lobe Epilepsy Alters Auditory-motor Integration For Voice Control

    Get PDF
    Temporal lobe epilepsy (TLE) is the most common drug-refractory focal epilepsy in adults. Previous research has shown that patients with TLE exhibit decreased performance in listening to speech sounds and deficits in the cortical processing of auditory information. Whether TLE compromises auditory-motor integration for voice control, however, remains largely unknown. To address this question, event-related potentials (ERPs) and vocal responses to vocal pitch errors (1/2 or 2 semitones upward) heard in auditory feedback were compared across 28 patients with TLE and 28 healthy controls. Patients with TLE produced significantly larger vocal responses but smaller P2 responses than healthy controls. Moreover, patients with TLE exhibited a positive correlation between vocal response magnitude and baseline voice variability and a negative correlation between P2 amplitude and disease duration. Graphical network analyses revealed a disrupted neuronal network for patients with TLE with a significant increase of clustering coefficients and path lengths as compared to healthy controls. These findings provide strong evidence that TLE is associated with an atypical integration of the auditory and motor systems for vocal pitch regulation, and that the functional networks that support the auditory-motor processing of pitch feedback errors differ between patients with TLE and healthy controls

    Analysing Changes in the Acoustic Features of the Human Voice to Detect Depression amongst Biological Females in Higher Education

    Get PDF
    Depression significantly affects a large percentage of the population, with young adult females being one of the most at-risk demographics. Concurrently, there is a growing demand on healthcare, and with sufficient resources often unavailable to diagnose depression, new diagnostic methods are needed that are both cost-effective and accurate. The presence of depression is seen to significantly affect certain acoustic features of the human voice. Acoustic features have been found to exhibit subtle changes beyond the perception of the human auditory system when an individual has depression. With advances in speech processing, these subtle changes can be observed by machines. By measuring these changes, the human voice can be analysed to identify acoustic features that show a correlation with depression. The implementation of voice diagnosis would both reduce the burden on healthcare and ensure those with depression are diagnosed in a timely fashion, allowing them quicker access to treatment. The research project presents an analysis of voice data from 17 biological females between the ages of 20-26 years old in higher education as a means to detect depression. Eight participants were considered healthy with no history of depression, whilst the other nine currently had depression. Participants performed two vocal tasks consisting of extending sounds for a period of time and reading back a passage of speech. Six acoustic features were then measured from the voice data to determine whether these features can be utilised as diagnostic indicators of depression. The main finding of this study demonstrated one of the acoustic features measured demonstrates significant differences when comparing depressed and healthy individuals.<br/
    corecore