Search CORE

491 research outputs found

음성 의미 지각시의 고등 언어 성분 처리 디코딩

Author: 김의영
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(석사) -- 서울대학교대학원 : 자연과학대학 협동과정 뇌과학전공, 2022. 8. 정천기.High-level linguistic processing in the human brain remains incompletely understood and constitutes a challenging topic in speech neuroscience. While most studies focused on decoding low-level phonetic components using intracranial recordings of the human brain during speech perception, few studies have attempted to decode high-level syntactic or semantic features. If any, most of the research targeting semantic decoding is conducted with picture naming tasks, which only deal with visual language rather than spoken language. The presenting study is focused on better characterizing the neural representations of processing spoken language perception, namely speech perception. Especially not on the lower-level language components such as phonemes or phonetics, but the higher-level components such as syntax and semantics. Since it is widely accepted that the tripartite nature of language processing consists of phonology, syntax, and semantics, a strategical method for analyzing speech perception tasks that can reject the intervention of phonetic factors was mandatory. Therefore, we conducted a question-and-answer speech task containing four questions revolving around two semantic categories (alive, body parts) with phonetically controlled words. Intracranial neural signals were recorded during the question-and-answer speech task using electrocorticography (ECoG) electrodes for 14 epilepsy patients. Post hoc brain activity analysis was conducted for three subjects who answered correctly to every trial (144 trials in total) to ensure the analyzed data contained only brain signals collected during the correct semantic processing. The decoding results suggest that absolute and relative spectral neural feature trends occur across all participants in particular time windows. Furthermore, the spatial aspect of the neural features that yield the best decoding accuracy verifies the current biophysiological brain language model explaining the circular nature of word meaning comprehension in the left hemisphere language network.인간의 고등 성분 언어 처리와 관련한 두뇌 활동을 해독하는 연구는 신경언어학 분야에서도 아직 깊이 연구되지 않은 분야 중 하나이다. 침습적 전극을 통해 얻은 뇌피질 뇌파를 이용한 대부분의 언어 디코딩 연구는 음소나 음절 수준의 하위 언어 성분에서 진행되어 왔고, 통사나 의미와 같은 고등 언어 성분에 대한 디코딩 연구는 드물다. 드물게 진행된 고등 언어 성분 디코딩 연구 또한 대다수가 시각적 언어 처리를 연구한 결과들이며, 소리 언어 디코딩 연구는 태동 단계에 머무르고 있다. 본 연구는 소리 언어 지각시의 두뇌 활성을 분석하여 그 처리 과정의 뇌파 신호 특성을 규명하고자 한다. 특히 인간 음성 언어의 하위 구성 성분보다는 통사와 의미 위주의 고등 구성 성분을 처리하는 데에 관여하는 뇌파의 시간적, 주파수적, 공간적 특성에 집중하여 분석을 진행하였다. 언어 처리의 주된 세 가지 요소는 ‘음소 (phonetics)’, ‘통사 (syntactics)’, ‘의미 (semantics)’라는 점을 고려하여, 음소 소준의 뇌파 활동을 통제할 수 있는 실험 패러다임을 구상하였으며, 구체적으로는 두개의 다른 의미 범주 (생명, 신체)에 대해서 묻는 음소적으로 동등한 단어가 포함된 질문을 들려준 후 의미를 파악해 대답하는 과정의 뇌파를 기록하는 실험을 진행하였다. 뇌파 신호는 경막하 전극 삽입술 (Electrocorticography, ECoG)을 통해 14명의 뇌전증 환자로부터 침습적 방식으로 측정되었다. 뇌파 디코딩 분석에는 피험자의 두뇌가 옳은 방식으로 처리한 고등 언어 성분이 반영된 실험만을 포함하기 위해서 모든 실험에서 옳은 대답을 한 세 명의 환자만을 대상으로 하여 분석을 진행하였다. 디코딩 분석 결과 세 명의 환자에 걸쳐 핵심 단어 (‘것은’, ‘무엇입니까’) 음성 지각 이후 특정 시간대에서 특정 주파수대의 뇌파가 양 극단의 의미를 높은 수준의 정확도(%)로 분류하는 데에 사용될 수 있다는 것을 밝혔다. 또한 이러한 높은 정확도를 기록한 뇌파의 특성에는 모든 환자에 걸쳐 절대적 혹은 상대적 트렌드가 관찰되며, 관찰되는 뇌파의 공간적 특성은 현재 통용되는 신경언어학적 언어 처리 모델이 설명하는 음성 언어 처리 방식과 일맥상통함을 밝혔다.Abstract ⅰ 1. Introduction 1 2. Materials and Methods 4 3. Results 8 4. Discussion 12 References 15 List of Figures 20 Supplementary information 28 Abstract in Korean 36석

SNU Open Repository and Archive

Telephone speech recognition via the combination of knowledge sources in a segmental speech model

Author: Gosztolya Gábor
Kocsor András
Tóth László
Publication venue
Publication date: 01/01/2004
Field of study

The currently dominant speech recognition methodology, Hidden Markov Modeling, treats speech as a stochastic random process with very simple mathematical properties. The simplistic assumptions of the model, and especially that of the independence of the observation vectors have been criticized by many in the literature, and alternative solutions have been proposed. One such alternative is segmental modeling, and the OASIS recognizer we have been working on in the recent years belongs to this category. In this paper we go one step further and suggest that we should consider speech recognition as a knowledge source combination problem. We offer a generalized algorithmic framework for this approach and show that both hidden Markov and segmental modeling are a special case of this decoding scheme. In the second part of the paper we describe the current components of the OASIS system and evaluate its performance on a very difficult recognition task, the phonetically balanced sentences of the MTBA Hungarian Telephone Speech Database. Our results show that OASIS outperforms a traditional HMM system in phoneme classification and achieves practically the same recognition scores at the sentence level

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

University of Szeged

Spoonerisms: An Analysis of Language Processing in Light of Neurobiology

Author: Sellers Naomi
Publication venue: ePublications at Regis University
Publication date: 01/01/2018
Field of study

Spoonerisms are described as the category of speech errors involving jumbled-up words. The author examines language, the brain, and the correlation between spoonerisms and the neural structures involved in language processing

ePublications at Regis University

Transformation of a temporal speech cue to a spatial neural code in human auditory cortex

Author: Chang E.
Fox N.
Leonard M.
Sjerps M.
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 25/08/2020
Field of study

In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like /b/ and/p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from /ba/ to /pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population’s preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues

MPG.PuRe

The Mason-Alberta Phonetic Segmenter: A forced alignment system based on deep neural networks and interpolation

Author: Kelley Matthew C.
Perry Scott James
Tucker Benjamin V.
Publication venue
Publication date: 23/10/2023
Field of study

Forced alignment systems automatically determine boundaries between segments in speech data, given an orthographic transcription. These tools are commonplace in phonetics to facilitate the use of speech data that would be infeasible to manually transcribe and segment. In the present paper, we describe a new neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). The MAPS aligner serves as a testbed for two possible improvements we pursue for forced alignment systems. The first is treating the acoustic model in a forced aligner as a tagging task, rather than a classification task, motivated by the common understanding that segments in speech are not truly discrete and commonly overlap. The second is an interpolation technique to allow boundaries more precise than the common 10 ms limit in modern forced alignment systems. We compare configurations of our system to a state-of-the-art system, the Montreal Forced Aligner. The tagging approach did not generally yield improved results over the Montreal Forced Aligner. However, a system with the interpolation technique had a 27.92% increase relative to the Montreal Forced Aligner in the amount of boundaries within 10 ms of the target on the test set. We also reflect on the task and training process for acoustic modeling in forced alignment, highlighting how the output targets for these models do not match phoneticians' conception of similarity between phones and that reconciliation of this tension may require rethinking the task and output targets or how speech itself should be segmented.Comment: submitted for publicatio

arXiv.org e-Print Archive

Better Evaluation of ASR in Speech Translation Context Using Word Embeddings

Author: Besacier Laurent
Le Ngoc-Tien
Lecouteux Benjamin
Servan Christophe
Publication venue: HAL CCSD
Publication date: 01/09/2016
Field of study

International audienceThis paper investigates the evaluation of ASR in spoken language translation context. More precisely, we propose a simple extension of WER metric in order to penalize differently substitution errors according to their context using word embeddings. For instance, the proposed metric should catch near matches (mainly morphological variants) and penalize less this kind of error which has a more limited impact on translation performance. Our experiments show that the correlation of the new proposed metric with SLT performance is better than the one of WER. Oracle experiments are also conducted and show the ability of our metric to find better hypotheses (to be translated) in the ASR N-best. Finally, a preliminary experiment where ASR tuning is based on our new metric shows encouraging results. For reproductible experiments, the code allowing to call our modified WER and the corpora used are made available to the research community

Hal - Université Grenoble Alpes

MULTIVARIATE ANALYSIS FOR UNDERSTANDING COGNITIVE SPEECH PROCESSING

Author: MAHMUD MD SULTAN
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2021
Field of study

MULTIVARIATE ANALYSIS FOR UNDERSTANDING COGNITIVE SPEECH PROCESSIN

University of Memphis Digital Commons

Recommended from our members

Cortical encoding and decoding models of speech production

Author: Chartier Josh
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

To speak is to dynamically orchestrate the movements of the articulators (jaw, tongue, lips, and larynx), which in turn generate speech sounds. It is an amazing mental and motor feat that is controlled by the brain and is fundamental for communication. Technology that could translate brain signals into speech would be transformative for people who are unable to communicate as a result of neurological impairments. This work first investigates how articulator movements that underlie natural speech production are represented in the brain. Building upon this, this work also presents a neural decoder that can synthesize audible speech from brain signals. Data to support these results were from direct cortical recordings of the human sensorimotor cortex while participants spoke natural sentences. Neural activity at individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements towards specific vocal tract shapes. The neural decoder was designed to leverage the kinematic trajectories encoded in the sensorimotor cortex which enhanced performance even with limited data. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication

eScholarship - University of California