106 research outputs found

    Prosodic Focus Interpretation in Spectrotemporally Degraded Speech by Non-Native Listeners

    Get PDF
    Purpose: This study assesses how spectrotemporal degradations that can occur in the sound transmission of a cochlear implant (CI) may influence the ability of non-native listeners to recognize the intended meaning of utterances based on the position of the prosodically focused word. Previous research suggests that perceptual accuracy and listening effort are negatively affected by CI processing (or CI simulations) or when the speech is presented in a non-native language, in a number of tasks and circumstances. How these two factors interact to affect prosodic focus interpretation, however, remains unclear.Method: In an online experiment, normal-hearing (NH) adolescent and adult native Dutch learners of English and a small control group of NH native English adolescents listened to CI-simulated (eight-channel noise-band vocoded) and non–CI-simulated English sentences differing in prosodically marked focus. For assessing perceptual accuracy, listeners had to indicate which of four possible context questions the speaker answered. For assessing listening effort, a dual-task paradigm was used with a secondary free recall task.Results: The results indicated that prosodic focus interpretation was significantly less accurate in the CI-simulated condition compared with the non–CI-simulated condition but that listening effort was not increased. Moreover, there was no interaction between the influence of the degraded CI-simulated speech signal and listening groups in either their perceptual accuracy or listening effort.Conclusion: Non-native listeners are not more strongly affected by spectrotemporal degradations than native listeners, and less proficient non-native listeners are not more strongly affected by these degradations than more proficient non-native listeners

    Recognition and cortical haemodynamics of vocal emotions-an fNIRS perspective

    Get PDF
    Normal-hearing listeners rely heavily on variations in the fundamental frequency (F0) of speech to identify vocal emotions. Without reliable F0 cues, as is the case for cochlear implant users, listeners’ ability to extract emotional meaning from speech is reduced. This thesis describes the development of an objective measure of vocal emotion recognition. The program of three experiments investigates: 1) NH listeners’ abilities to use F0, intensity, and speech-rate cues to recognise emotions; 2) cortical activity associated with individual vocal emotions assessed using functional near-infrared spectroscopy (fNIRS); 3) cortical activity evoked by vocal emotions in natural speech and in speech with uninformative F0 using fNIRS

    A MODELING PERSPECTIVE ON DEVELOPING NATURALISTIC NEUROPROSTHETICS USING ELECTRICAL STIMULATION

    Get PDF
    Direct electrical stimulation of neurons has been an important tool for understanding the brain and neurons, since the field of neuroscience began. Electrical stimulation was used to first understand sensation, the mapping of the brain, and more recently function, and, as our understanding of neurological disorders has advanced, it has become an increasingly important tool for interacting with neurons to design and carry out treatments. The hardware for electrical stimulation has greatly improved during the last century, allowing smaller scale, implantable treatments for a variety of disorders, from loss of sensations (hearing, vision, balance) to Parkinson’s disease and depression. Due to the clinical success of these treatments for a variety of impairments today, there are millions of neural implant users around the globe, and interest in medical implants and implants for human-enhancement are only growing. However, present neural implant treatments restore only limited function compared to natural systems. A limiting factor in the advancement of electrical stimulation-based treatments has been the restriction of using charge-balanced and typically short sub-millisecond pulses in order to safely interact with the brain, due to a reliance on durable, metal electrodes. Material science developments have led to more flexible electrodes that are capable of delivering more charge safely, but a focus has been on density of electrodes implanted over changing the waveform of electrical stimulation delivery. Recently, the Fridman lab at Johns Hopkins University developed the Freeform Stimulation (FS)– an implantable device that uses a microfluidic H-bridge architecture to safely deliver current for prolonged periods of time and that is not restricted to charge-balanced waveforms. In this work, we refer to these non-restricted waveforms as galvanic stimulation, which is used as an umbrella term that encompasses direct current, sinusoidal current, or alternative forms of non-charge-balanced current. The invention of the FS has opened the door to usage of galvanic stimulation in neural implants, begging an exploration of the effects of local galvanic stimulation on neural function. Galvanic stimulation has been used in the field of neuroscience, prior to concerns about safe long-term interaction with neurons. Unlike many systems, it had been historically used in the vestibular system internally and in the form of transcutaneous stimulation to this day. Historic and recent studies confirm that galvanic stimulation of the vestibular system has more naturalistic effects on neural spike timing and on induced behavior (eye velocities) than pulsatile stimulation, the standard in neural implants now. Recent vestibular stimulation studies with pulses also show evidence of suboptimal responses of neurons to pulsatile stimulation in which suprathreshold pulses only induce about half as many action potentials as pulses. This combination of results prompted an investigation of differences between galvanic and pulsatile electrical stimulation in the vestibular system. The research in this dissertation uses detailed biophysical modeling of single vestibular neurons to investigate the differences in the biophysical mechanism of galvanic and pulsatile stimulation. In Chapter 2, a more accurate model of a vestibular afferent is constructed from an existing model, and it is used to provide a theory for how galvanic stimulation produces a number of known effects on vestibular afferents. In Chapter 3, the same model is used to explain why pulsatile stimulation produces fewer action potentials than expected, and the results show that pulse amplitude, pulse rate, and the spontaneous activity of neurons at the axon have a number of interactions that lead to several non-monotonic relationships between pulse parameters and induced firing rate. Equations are created to correct for these non-monotonic relationships and produce intended firing rates. Chapter 4 focuses on how to create a neural implant that induces more naturalistic firing using the scientific understanding from Chapters 2 and 3 and machine learning. The work concludes by describing the implications of these findings for interacting with neurons and population and network scales and how this may make electrical stimulation increasingly more suited for treating complex network-level and psychiatric disorders

    Koklear İmplant Konuşma İşlemcileri için Optimum Parametrelerin Objektif Ölçütler Kullanılarak Belirlenmesi

    Get PDF
    In a cochlear implant (CI) speech processor, several parameters such as channel numbers, bandwidths, rectification type, and cutoff frequency play an important role in acquiring enhanced speech. The effective and general purpose CI approach has been a research topic for a long time. In this study, it is aimed to determine the optimum parameters for CI users by using different channel numbers (4, 8, 12, 16, and 22), rectification types (half and full) and cutoff frequencies (200, 250, 300, 350, and 400 Hz). The CI approaches have been tested on Turkish sentences which are taken from METU database. The optimum CI structure has been tested with objective quality that weighted spectral slope (WSS) and objective intelligibility measures such as short-term objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). Experimental results show that 400 Hz cutoff frequency, full wave rectifier, and 16-channels CI approach give better quality and higher intelligibility scores than other CI approaches according to STOI, PESQ and WSS results. The proposed CI approach provides the ability to percept 91% of output vocoded Turkish speech for CI users. © 2022, TUBITAK. All rights reserved

    IberSPEECH 2020: XI Jornadas en TecnologĂ­a del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli

    The influence of channel and source degradations on intelligibility and physiological measurements of effort

    Get PDF
    Despite the fact that everyday listening is compromised by acoustic degradations, individuals show a remarkable ability to understand degraded speech. However, recent trends in speech perception research emphasise the cognitive load imposed by degraded speech on both normal-hearing and hearing-impaired listeners. The perception of degraded speech is often studied through channel degradations such as background noise. However, source degradations determined by talkers’ acoustic-phonetic characteristics have been studied to a lesser extent, especially in the context of listening effort models. Similarly, little attention has been given to speaking effort, i.e., effort experienced by talkers when producing speech under channel degradations. This thesis aims to provide a holistic understanding of communication effort, i.e., taking into account both listener and talker factors. Three pupillometry studies are presented. In the first study, speech was recorded for 16 Southern British English speakers and presented to normal-hearing listeners in quiet and in combination with three degradations: noise-vocoding, masking and time-compression. Results showed that acoustic-phonetic talker characteristics predicted intelligibility of degraded speech, but not listening effort, as likely indexed by pupil dilation. In the second study, older hearing-impaired listeners were presented fast time-compressed speech under simulated room acoustics. Intelligibility was kept at high levels. Results showed that both fast speech and reverberant speech were associated with higher listening effort, as suggested by pupillometry. Discrepancies between pupillometry and perceived effort ratings suggest that both methods should be employed in speech perception research to pinpoint processing effort. While findings from the first two studies support models of degraded speech perception, emphasising the relevance of source degradations, they also have methodological implications for pupillometry paradigms. In the third study, pupillometry was combined with a speech production task, aiming to establish an equivalent to listening effort for talkers: speaking effort. Normal-hearing participants were asked to read and produce speech in quiet or in the presence of different types of masking: stationary and modulated speech-shaped noise, and competing-talker masking. Results indicated that while talkers acoustically enhance their speech more under stationary masking, larger pupil dilation associated with competing-speaker masking reflected higher speaking effort. Results from all three studies are discussed in conjunction with models of degraded speech perception and production. Listening effort models are revisited to incorporate pupillometry results from speech production paradigms. Given the new approach of investigating source factors using pupillometry, methodological issues are discussed as well. The main insight provided by this thesis, i.e., the feasibility of applying pupillometry to situations involving listener and talker factors, is suggested to guide future research employing naturalistic conversations

    Meta-Analysis on the Identification of Linguistic and Emotional Prosody in Cochlear Implant Users and Vocoder Simulations

    Get PDF
    Objectives: This study quantitatively assesses how cochlear implants (CIs) and vocoder simulations of CIs influence the identification of linguistic and emotional prosody in nontonal languages. By means of meta-analysis, it was explored how accurately CI users and normal-hearing (NH) listeners of vocoder simulations (henceforth: simulation listeners) identify prosody compared with NH listeners of unprocessed speech (henceforth: NH listeners), whether this effect of electric hearing differs between CI users and simulation listeners, and whether the effect of electric hearing is influenced by the type of prosody that listeners identify or by the availability of specific cues in the speech signal. Design: Records were found by searching the PubMed Central, Web of Science, Scopus, Science Direct, and PsycINFO databases (January 2018) using the search terms “cochlear implant prosody” and “vocoder prosody.” Records (published in English) were included that reported results of experimental studies comparing CI users’ and/or simulation listeners’ identification of linguistic and/or emotional prosody in nontonal languages to that of NH listeners (all ages included). Studies that met the inclusion criteria were subjected to a multilevel random-effects meta-analysis. Results: Sixty-four studies reported in 28 records were included in the meta-analysis. The analysis indicated that CI users and simulation listeners were less accurate in correctly identifying linguistic and emotional prosody compared with NH listeners, that the identification of emotional prosody was more strongly compromised by the electric hearing speech signal than linguistic prosody was, and that the low quality of transmission of fundamental frequency (f0) through the electric hearing speech signal was the main cause of compromised prosody identification in CI users and simulation listeners. Moreover, results indicated that the accuracy with which CI users and simulation listeners identified linguistic and emotional prosody was comparable, suggesting that vocoder simulations with carefully selected parameters can provide a good estimate of how prosody may be identified by CI users. Conclusions: The meta-analysis revealed a robust negative effect of electric hearing, where CIs and vocoder simulations had a similar negative influence on the identification of linguistic and emotional prosody, which seemed mainly due to inadequate transmission of f0 cues through the degraded electric hearing speech signal of CIs and vocoder simulations

    The Role of Age and Bilingualism on Perception of Vocoded Speech

    Get PDF
    This dissertation examines the role of age and bilingualism on perception of vocoded speech in order to determine whether bilingual individuals, children, and bilingual individuals with later ages of second language acquisition show greater difficulties in vocoded speech perception. Measures of language skill and verbal inhibition were also examined in relation to vocoded speech perception. Two studies were conducted, each of which had two participant language groups: Monolingual English speakers and bilingual Spanish-English speakers. The first study also explored the role of age at the time of testing by including both monolingual and bilingual children (8-10 years), and monolingual and bilingual adults (18+ years). As such, this study included four total groups of adult and child language pairs. Participants were tested on vocoded stimuli simulating speech as perceived through an 8-channel CI in conditions of both deep (0-mm shift) and shallow (6-mm shift) insertion of the electrode array. Between testing trials, participants were trained on the more difficult, 6-mm shift condition. The second study explored the role of age of second language acquisition in native speakers of Spanish (18+ years) first exposed to English at ages ranging from 0 to 12 years. This study also included a control group of monolingual English speakers (18+ years). This study examined perception of target lexical items presented either in isolation or at the end of sentences. Stimuli in this study were either unaltered or vocoded to simulate speech as heard through an 8-channel CI at 0-mm shift. Items presented in isolation were divided into differing levels of difficulty based on frequency and neighborhood density. Target items presented at the ends of sentences were divided into differing levels of difficulty based on the degree of semantic context provided by the sentence. No effects of age at testing or age of acquisition were found. In the first study, there was also no effect of language group. All groups improved with training and showed significant improvement between pre- and post-test speech perception scores in both conditions of shift. In the second study, all participants were significantly negatively impacted by vocoding; however, bilingual participants showed greater difficulty in perception of vocoded lexical items presented in isolation relative to their monolingual peers. This group difference was not found in sentence conditions, where all participants significantly benefited from greater semantic context. From this, we can conclude that bilingual individuals can make use of semantic context to perceive vocoded speech similarly to their monolingual peers. Neither language skills nor verbal inhibition, as measured in these studies, were found to significantly impact speech perception scores in any of the tested conditions across groups

    Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing

    Get PDF
    otorhinolaryngology; neurosciences; hearin
    • …
    corecore