Search CORE

852 research outputs found

Recommended from our members

Real-time decoding of question-and-answer speech dialogue using human cortical activity.

Author: Chang Edward F
Leonard Matthew K
Makin Joseph G
Moses David A
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate

eScholarship - University of California

Does Speech-To-Text Assistive Technology Improve the Written Expression of Students with Traumatic Brain Injury?

Author: Noakes Michaela Ann
Publication venue: Duquesne Scholarship Collection
Publication date: 01/01/2017
Field of study

Traumatic Brain Injury outcomes vary by individual due to age at the onset of injury, the location of the injury, and the degree to which the deficits appear to be pronounced, among other factors. As an acquired injury to the brain, the neurophysiological consequences are not homogenous; they are as varied as the individuals who experience them. Persistent impairment in executive functions of attention, initiation, planning, organizing, and memory are likely to be present in children with moderate to severe TBIs. Issues with sensory and motor skills, language, auditory or visual sensation changes, and variations in emotional behavior may also be present. Germane to this study, motor dysfunction is a common long-term sequelae of TBI that manifests in academic difficulties. Borrowing from the learning disability literature, children with motor dysfunction are likely to have transcription deficits, or deficits related to the fine-motor production of written language. This study aimed to compare the effects of handwriting with an assistive technology accommodation on the writing performance of three middle school students with TBIs and writing difficulties. The study utilized an alternating treatments design (ATD), comparing the effects of handwriting responses to story prompts to the use of speech-to-text AT to record participant responses. Speech-to-text technology, like Dragon Naturally Speaking converts spoken language into a print format on a computer screen with a high degree of accuracy. In theory, because less effort is spent on transcription, there is a reduction in cognitive load, enabling more time to be spent on generation skills, such as idea development, selecting more complex words that might be otherwise difficult to spell, and grammar. Overall, all three participants showed marked improvement with the application of speech-to-text AT. The results indicate a positive pattern for the AT as an accommodation with these children that have had mild-to-moderate TBIs as compared to their written output without the AT accommodation. The findings of this study are robust. Through visual analysis of the results, it is evident that the speech-to-text dictation condition was far superior to the handwriting condition (HW) with an effect size that ranged + 3.4 to + 8.8 across participants indicating a large treatment effect size. Perhaps more impressive, was 100 percent non-overlap of data between the two conditions across participants and dependent variables. The application of speech-to-text AT resulted in significantly improved performance across writing indicators in these students with a history of TBIs. Speech-to-Text AT may prove to be an excellent accommodation for children with TBI and fine motor skill deficits. The conclusions drawn from the results of this study indicate the Speech-to-Text AT was more effective than a handwriting condition for all three participants. By providing this AT, these students each improved in the quality, construction, and duration of their written expression as evidenced in the significant gains in TWW, WSC, and CWS

The Berlin Brain-Computer Interface: Progress Beyond Communication and Control

Author: Acqualagna Laura
Blankertz Benjamin
Curio Gabriel
Dähne Sven
Haufe Stefan
Müller Klaus-Robert
Schultze-Kraft Matthias
Sturm Irene
Ušćumlic Marija
Wenzel Markus
Publication venue
Publication date: 21/11/2016
Field of study

The combined effect of fundamental results about neurocognitive processes and advancements in decoding mental states from ongoing brain signals has brought forth a whole range of potential neurotechnological applications. In this article, we review our developments in this area and put them into perspective. These examples cover a wide range of maturity levels with respect to their applicability. While we assume we are still a long way away from integrating Brain-Computer Interface (BCI) technology in general interaction with computers, or from implementing neurotechnological measures in safety-critical workplaces, results have already now been obtained involving a BCI as research tool. In this article, we discuss the reasons why, in some of the prospective application domains, considerable effort is still required to make the systems ready to deal with the full complexity of the real world.EC/FP7/611570/EU/Symbiotic Mind Computer Interaction for Information Seeking/MindSeeEC/FP7/625991/EU/Hyperscanning 2.0 Analyses of Multimodal Neuroimaging Data: Concept, Methods and Applications/HYPERSCANNING 2.0DFG, 103586207, GRK 1589: Verarbeitung sensorischer Informationen in neuronalen Systeme

Features of hearing: applications of machine learning to uncover the building blocks of hearing

Author: Weerts Lotte
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/11/2021
Field of study

Recent advances in machine learning have instigated a renewed interest in using machine learning approaches to better understand human sensory processing. This line of research is particularly interesting for speech research since speech comprehension is uniquely human, which complicates obtaining detailed neural recordings. In this thesis, I explore how machine learning can be used to uncover new knowledge about the auditory system, with a focus on discovering robust auditory features. The resulting increased understanding of the noise robustness of human hearing may help to better assist those with hearing loss and improve Automatic Speech Recognition (ASR) systems. First, I show how computational neuroscience and machine learning can be combined to generate hypotheses about auditory features. I introduce a neural feature detection model with a modest number of parameters that is compatible with auditory physiology. By testing feature detector variants in a speech classification task, I confirm the importance of both well-studied and lesser-known auditory features. Second, I investigate whether ASR software is a good candidate model of the human auditory system. By comparing several state-of-the-art ASR systems to the results from humans on a range of psychometric experiments, I show that these ASR systems diverge markedly from humans in at least some psychometric tests. This implies that none of these systems act as a strong proxy for human speech recognition, although some may be useful when asking more narrowly defined questions. For neuroscientists, this thesis exemplifies how machine learning can be used to generate new hypotheses about human hearing, while also highlighting the caveats of investigating systems that may work fundamentally differently from the human brain. For machine learning engineers, I point to tangible directions for improving ASR systems. To motivate the continued cross-fertilization between these fields, a toolbox that allows researchers to assess new ASR systems has been released.Open Acces

Spiral - Imperial College Digital Repository

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab

Author: Gao Yingming
Publication venue
Publication date: 04/08/2022
Field of study

Articulatory copy synthesis (ACS), a subarea of speech inversion, refers to the reproduction of natural utterances and involves both the physiological articulatory processes and their corresponding acoustic results. This thesis proposes two novel methods for the ACS of human speech using the articulatory speech synthesizer VocalTractLab (VTL) to address or mitigate the existing problems of speech inversion, such as non-unique mapping, acoustic variation among different speakers, and the time-consuming nature of the process. The first method involved finding appropriate VTL gestural scores for given natural utterances using a genetic algorithm. It consisted of two steps: gestural score initialization and optimization. In the first step, gestural scores were initialized using the given acoustic signals with speech recognition, grapheme-to-phoneme (G2P), and a VTL rule-based method for converting phoneme sequences to gestural scores. In the second step, the initial gestural scores were optimized by a genetic algorithm via an analysis-by-synthesis (ABS) procedure that sought to minimize the cosine distance between the acoustic features of the synthetic and natural utterances. The articulatory parameters were also regularized during the optimization process to restrict them to reasonable values. The second method was based on long short-term memory (LSTM) and convolutional neural networks, which were responsible for capturing the temporal dependence and the spatial structure of the acoustic features, respectively. The neural network regression models were trained, which used acoustic features as inputs and produced articulatory trajectories as outputs. In addition, to cover as much of the articulatory and acoustic space as possible, the training samples were augmented by manipulating the phonation type, speaking effort, and the vocal tract length of the synthetic utterances. Furthermore, two regularization methods were proposed: one based on the smoothness loss of articulatory trajectories and another based on the acoustic loss between original and predicted acoustic features. The best-performing genetic algorithms and convolutional LSTM systems (evaluated in terms of the difference between the estimated and reference VTL articulatory parameters) obtained average correlation coefficients of 0.985 and 0.983 for speaker-dependent utterances, respectively, and their reproduced speech achieved recognition accuracies of 86.25% and 64.69% for speaker-independent utterances of German words, respectively. When applied to German sentence utterances, as well as English and Mandarin Chinese word utterances, the neural network based ACS systems achieved recognition accuracies of 73.88%, 52.92%, and 52.41%, respectively. The results showed that both of these methods not only reproduced the articulatory processes but also reproduced the acoustic signals of reference utterances. Moreover, the regularization methods led to more physiologically plausible articulatory processes and made the estimated articulatory trajectories be more articulatorily preferred by VTL, thus reproducing more natural and intelligible speech. This study also found that the convolutional layers, when used in conjunction with batch normalization layers, automatically learned more distinctive features from log power spectrograms. Furthermore, the neural network based ACS systems trained using German data could be generalized to the utterances of other languages

Technische Universität Dresden: Qucosa

The impact of vocal expressions on the understanding of affective states in others

Author: Jürgens Rebecca
Publication venue
Publication date: 24/11/2014
Field of study

Ein wichtiger Aspekt des täglichen sozialen Lebens ist das Erkennen von emotionalen Zuständen in unserem Gegenüber. Unsere Emotionen und Intentionen teilen wir nicht nur durch sprachliche Äußerungen mit, sondern auch über die Mimik, Körpersprache und den Tonfall in der Stimme. Diese nichtverbalen, emotionalen Ausdrücke sind Bestandteile einer Emotion, zu denen darüber hinaus das subjektive Empfinden, die Handlungsbereitschaft und die damit zusammenhängenden physiologischen Reaktionen gehören. Obwohl die emotionale Kommunikation schon seit Jahrzehnten im Fokus der Wissenschaft liegt, ist noch unklar, welche Bestandteile einer Emotion genau kommuniziert und wie diese Informationen verarbeitet werden. Zudem spielen emotionale Ausdrücke eine wichtige Rolle in sozialen Interaktionen und werden häufig bewusst verwendet, um sozial-angepasstes Verhalten zu zeigen. Damit ist ihre Reliabilität, die tatsächliche Gefühlswelt des Gegenübers wiederzugeben, fraglich. Das Erkennen von Emotionsausdrücken, die auf empfunden Emotionen basieren ist jedoch von enormer Wichtigkeit für die nachfolgenden Handlungen. Deswegen sollte die Fähigkeit, empfundene von gespielten Emotionen unterscheiden zu können, essentiell sein. Da vokale Ausdrücke durch Einflüsse des autonomen Nervensystems auf den Vokaltrakt gebildet werden, sind diese als besonders vielversprechend anzusehen, um zugrundeliegende emotionale Zustände aufzudecken. Die Erkennung von Emotionen im Gegenüber ist nicht unveränderlich, sondern hängt unter anderem auch von der Beziehung zwischen dem Sprecher und dem Zuhörer ab. So konnte in einer früheren Studie gezeigt werden, dass bei Personen, die derselben Gruppe angehören, Emotionen besser erkannt werden konnten. Dieser Effekt lässt sich einerseits mit einer Aufmerksamkeitsverschiebung hin zu Personen mit erhöhter sozialer Relevanz deuten. Andererseits gibt es Erklärungsansätze, die auf eine erhöhte Bereitschaft für empathische Reaktionen hinweisen. Erfolgreiches Verstehen von Emotionen wird in der Forschungsliteratur eng mit dem Spiegeln oder dem Simulieren der wahrgenommen Emotion verknüpft. Die affektiven Neurowissenschaften zeigten bisher ein gemeinsames neuronales Netzwerk, welches aktiv ist, wenn Personen eine Emotion bei anderen wahrnehmen oder selber empfinden. Die neurale Aktivität in diesem Netzwerk wird zudem von der sozialen Relevanz der Person beeinflusst, welche die Emotion zeigt. Welches Ausmaß das Wiederspiegeln einer Emotion auf der Verhaltensebene hat um eine Emotion zu erkennen ist hingegen noch ungeklärt. Auch die Frage nach dem Einfluss des Sprechers auf die empathische Reaktion ist noch nicht abschließend geklärt. In dieser Arbeit untersuchte ich vokale Emotionsausdrücke und versuchte zunächst das Verhältnis zwischen gespielten und spontanen Ausdrücken zu verstehen. Anschließend konzentrierte ich mich auf die Frage, welche Bedeutung das Teilen einer Emotion und die Relevanz des Sprechers auf die Emotionserkennung haben. Im ersten Teil dieser Arbeit verglich ich die Wahrnehmung von spontanen und gespielten vokalen Ausdrücken in einer interkulturellen Studie. Im Gegensatz zu spontanen Ausdrücken wurde angenommen, dass gespielte Ausdrücke vermehrt auf sozialen Codes basieren und daher von Hörern anderer Kulturen als der Herkunftskultur weniger akkurat erkannt werden. Alternativ könnte die Emotionserkennung beider Bedingungen universell sein. Dieser interkulturelle Vergleich wurde anhand von 80 spontanen Emotionsausdrücken durchgeführt, die von Menschen aufgenommen wurden, welche sich in emotionalen Situationen befanden. Die gespielten Stimuli bestanden aus den nachgespielten Szenen, die von professionellen Schauspielern eingesprochen worden. Kurze Sequenzen dieser Ausdrücke wurden Versuchspersonen in Deutschland, Rumänien und Indonesien vorgespielt. Die Versuchspersonen erhielten die Aufgabe anzugeben, welche Emotion dargestellt wurde und ob der Ausdruck gespielt oder echt war. Im Ganzen konnten die Versuchspersonen nur unzureichend angeben, inwieweit ein Ausdruck gespielt war. Deutsche Hörer waren in beiden Aufgaben besser als die Hörer der anderen Kulturen. Dieser Vorteil war unabhängig von der Authentizität des Stimulus. Die Emotionserkennung zeigte ein vergleichbares Muster in allen Kulturen, was für eine universelle Grundlage der Emotionserkennung spricht. Die Erkennungsraten im Allgemeinen waren schwach ausgeprägt und ob ein Ausdruck gespielt oder echt war, beeinflusste lediglich die Erkennung von den Emotionen Ärger und Trauer. Ärger wurde besser erkannt wenn er gespielt war und Trauer wenn sie echt war. Der zweite Teil meiner Arbeit beschäftigte sich mit der Ursache für die oben erwähnten Unterschiede in der Emotionserkennung und untersuchte, welchen Einfluss Schauspieltraining auf die Glaubwürdigkeit der Emotionsdarstellung hat. Zu diesem Zweck erweiterte ich den Stimulus-Korpus um Emotionsausdrücke, die von schauspiel-unerfahrenen Sprechern eingesprochen wurden. Zusätzlich zu der Bewertungsstudie führte ich eine akustische Analyse der Sprachaufnahmen durch. Es wurde vorhergesagt, dass professionelle Schauspieler besser geeignet seien als schauspiel-unerfahrene Sprecher, um glaubwürdig Emotionsausdrücke zu generieren. Diese Vorhersage konnte jedoch nicht bestätigt werden. Die Ausdrücke der professionellen Schauspieler wurden im Gegenteil sogar häufiger als gespielt wahrgenommen als die der unerfahrenen Sprecher. Für die professionellen Sprecher konnte ich das Muster in der Emotionserkennung, welches sich in der interkulturellen Studie zeigte, replizieren. Die Ausdrücke der unerfahrenen Sprecher hingegen wichen nur in den geringeren Erkennungsraten für Trauer von den spontanen Ausdrücken ab. Der Haupteffekt der akustischen Analyse bestand in einer lebhafteren Sprachmelodie der gespielten Ausdrücke. Im dritten Teil der Arbeit untersuchte ich den Prozess der Emotionserkennung. Zu diesem Zweck manipulierte ich in einem Experiment die biographische Ähnlichkeit zwischen fiktiven Sprechern und dem Hörer. Auf Grund der höheren Relevanz eines ähnlichen Sprechers, sollten emotionale Ausdrücke in der ähnlichen Bedingung besser erkannt werden als in der unähnlichen. Um den Einfluss des gemeinsamen Erlebens einer Emotion auf die Emotionserkennung festzustellen, zeichnete ich außerdem die Hautleitfähigkeit und die Pupillenveränderung auf, welches beides Marker für Reaktionen des autonomen Nervensystems sind. Währenddessen wurden den Versuchspersonen ärgerliche, freudige und neutrale vokale Ausdrücke präsentiert, welche sie zu bewerten hatten. Ähnlichkeit hatte weder einen Einfluss auf die Emotionserkennung noch auf die peripher-physiologischen Messungen. Die Versuchspersonen zeigten keine Reaktionen der Hautleitfähigkeit auf vokale Ausdrücke. Die Pupille hingegen reagierte emotionsabhängig. Diese Befunde deuten darauf hin, dass die affektive Verarbeitung nicht das gesamte autonome Nervensystem miteinschließt, zumindest nicht, wenn lediglich die Stimme verarbeitet wird. Das Teilen einer Emotion scheint demnach kein notweniger Bestandteil des Verstehens oder der Erkennung zu sein. Die Ähnlichkeit zwischen Sprecher und Hörer könnte die Emotionsverarbeitung in einer lebensnahen Umgebung beeinflussen, in der eine persönliche Verbindung zwischen beiden Interaktionspartnern möglich ist, nicht hingegen in einer mehrheitlich artifiziellen Manipulation. Empathische Reaktionen brauchen um wirksam zu werden einen ganzheitlicheren Ansatz. Meine Arbeit konzentrierte sich auf das Verständnis von emotionaler Kommunikation in Bezug auf vokale Emotionsausdrücke und konnte zeigen, dass das bewusste Hören einzelner, kontextfreier Emotionsausdrücke nicht ausreichend ist um auf tatsächliche emotionale Zustände rückschließen zu können. Dies wird durch die fehlende Differenzierung von gespielten und spontanen Emotionsausdrücken deutlich. Darüber hinaus konnte ich aufzeigen, dass vokale Emotionsausdrücke im Hörer keine starken Reaktionen des autonomen Nervensystems auslösen. Die Kommunikation mittels vokaler emotionaler Ausdrücke scheint daher vermehrt auf kognitiven als auf affektiven Prozessen zu basieren

Effect of repetition protocol on verb naming and sentence generation in a Chinese anomia speaker

Author: So Pui-ling, Erin
蘇佩玲
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/2009
Field of study

"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2009."Thesis (B.Sc)--University of Hong Kong, 2009.Includes bibliographical references (p. 28-30).published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub

Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech

Author: Cooney Ciaran
Coyle Damien
Folli Raffaella
Publication venue
Publication date: 01/09/2022
Field of study

Ulster University's Research Portal

WHERE IS THE LOCUS OF DIFFICULTY IN RECOGNIZING FOREIGN-ACCENTED WORDS? NEIGHBORHOOD DENSITY AND PHONOTACTIC PROBABILITY EFFECTS ON THE RECOGNITION OF FOREIGN-ACCENTED WORDS BY NATIVE ENGLISH LISTENERS

Author: Chan Kit Ying
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2012
Field of study

This series of experiments (1) examined whether native listeners experience recognition difficulty in all kinds of foreign-accented words or only in a subset of words with certain lexical and sub-lexical characteristics-- neighborhood density and phonotactic probability; (2) identified the locus of foreign-accented word recognition difficulty, and (3) investigated how accent-induced mismatches impact the lexical retrieval process. Experiments 1 and 4 examined the recognition of native-produced and foreign-accented words varying in neighborhood density with auditory lexical decision and perceptual identification tasks respectively, which emphasize the lexical level of processing. Findings from Experiment 1 revealed increased accent-induced processing cost in reaction times, especially for words with many similar sounding words, implying that native listeners increase their reliance on top-down lexical knowledge during foreign-accented word recognition. Analysis of perception errors from Experiment 4 found the misperceptions in the foreign-accented condition to be more similar to the target words than those in the native-produced condition. This suggests that accent-induced mismatches tend to activate similar sounding words as alternative word candidates, which possibly pose increased lexical competition for the target word and result in greater processing costs for foreign-accented word recognition at the lexical level. Experiments 2 and 3 examined the sub-lexical processing of the foreign-accented words varying in neighborhood density and phonotactic probability respectively with a same-different matching task, which emphasizes the sub-lexical level of processing. Findings from both experiments revealed no extra processing costs , in either reaction times or accuracy rates, for the foreign-accented stimuli, implying that the sub-lexical processing of the foreign-accented words is as good as that of the native-produced words. Taken together, the overall recognition difficulty of foreign-accented stimuli, as well as the differentially increased processing difficulty for accented dense words (observed in Experiment 1), mainly stems from the lexical level, due to the increased lexical competition posed by the similar sounding word candidates