336 research outputs found
ベイズ法によるマイクロフォンアレイ処理
京都大学0048新制・課程博士博士(情報学)甲第18412号情博第527号新制||情||93(附属図書館)31270京都大学大学院情報学研究科知能情報学専攻(主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA
The Neural Time Course of Semantic Ambiguity Resolution in Speech Comprehension.
Semantically ambiguous words challenge speech comprehension, particularly when listeners must select a less frequent (subordinate) meaning at disambiguation. Using combined magnetoencephalography (MEG) and EEG, we measured neural responses associated with distinct cognitive operations during semantic ambiguity resolution in spoken sentences: (i) initial activation and selection of meanings in response to an ambiguous word and (ii) sentence reinterpretation in response to subsequent disambiguation to a subordinate meaning. Ambiguous words elicited an increased neural response approximately 400-800 msec after their acoustic offset compared with unambiguous control words in left frontotemporal MEG sensors, corresponding to sources in bilateral frontotemporal brain regions. This response may reflect increased demands on processes by which multiple alternative meanings are activated and maintained until later selection. Disambiguating words heard after an ambiguous word were associated with marginally increased neural activity over bilateral temporal MEG sensors and a central cluster of EEG electrodes, which localized to similar bilateral frontal and left temporal regions. This later neural response may reflect effortful semantic integration or elicitation of prediction errors that guide reinterpretation of previously selected word meanings. Across participants, the amplitude of the ambiguity response showed a marginal positive correlation with comprehension scores, suggesting that sentence comprehension benefits from additional processing around the time of an ambiguous word. Better comprehenders may have increased availability of subordinate meanings, perhaps due to higher quality lexical representations and reflected in a positive correlation between vocabulary size and comprehension success
Scalable Source Localization with Multichannel Alpha-Stable Distributions
International audienceIn this paper, we focus on the problem of sound source localization and we propose a technique that exploits the known and arbitrary geometry of the microphone array. While most probabilistic techniques presented in the past rely on Gaussian models, we go further in this direction and detail a method for source localization that is based on the recently proposed alpha-stable harmonizable processes. They include Cauchy and Gaussian as special cases and their remarkable feature is to allow a simple modeling of impulsive and real world sounds with few parameters. The approach we present builds on the classical convolutive mixing model and has the particularities of requiring going through the data only once, to also work in the underdetermined case of more sources than microphones and to allow massively parallelizable implementations operating in the time-frequency domain. We show that the method yields interesting performance for acoustic imaging in realistic simulations
Recommended from our members
Dynamic speech networks in the brain: Dual contribution of incrementality and constraints in access to semantics
This thesis explores the spatiotemporal network dynamics underlying natural speech comprehension, as measured by electro-magnetoencephalography (E/MEG). I focus on the transient effects of incrementality and constraints in speech on access to lexical semantics. Through three E/MEG experiments I address two core issues in systems neuroscience of language: 1) What are the network dynamics underpinning cognitive computations that take place when we map sounds to rich semantic representations? 2) How do the prior semantic and syntactic contextual constraints facilitate this mapping?
Experiment 1 investigated the cognitive processes and relevant networks that come online prior to a word’s recognition point (e.g. “f” for butterfly) as we access meaning through speech in isolation. The results revealed that 300 ms before the word is recognised, the speech incrementally activated matching phonological and semantic representations resulting in transient competition. This competition recruited LIFG, and modality specific regions (LSMG, LSTG for the phonological; LAG and MTG for the semantic domain). Immediately after the word’s recognition point the semantic representation of the target concept was boosted, and rapidly accessed recruiting bilateral MTG and AG.
Experiment 2 explored the cortical networks underpinning contextual semantic processing in speech. Participant listened to two-word spoken phrases where the semantic constraint provided by the modifier was manipulated. To separate out cognitive networks that are modulated by semantic constraint from task positive networks I performed a temporal independent component analysis. Among 14 networks extracted, only the activity of bilateral AG was modulated by semantic constraint between -400 to -300 ms before the noun’s recognition point.
Experiment 3 addressed the influence of sentential syntactic constraint on anticipation and activation of upcoming syntactic frames in speech. Participants listened to sentences with local syntactic ambiguities. The analysis of the connectivity dynamics in the left frontotemporal syntax network showed that the processing of sentences that contained the less anticipated syntactic structure showed early increased feedforward information flow in 0-100 ms, followed by increased recurrent connectivity between LIFG and LpMTG from the 200-500 ms from the verb onset.
Altogether the three experiments reveal novel insights into transient cognitive networks recruited incrementally over time both in the absence of and with context, as the speech unfolds, and how the activation of these networks are modulated by contextual syntactic and semantic constraints. Further I provide neural evidence that contextual constraints serve to facilitate speech comprehension, and how the speech networks recover from failed anticipations.Istanbul Chamber of Commerce
Turkish Petrol Foundatio
Variational Bayesian Inference for Source Separation and Robust Feature Extraction
International audienceWe consider the task of separating and classifying individual sound sources mixed together. The main challenge is to achieve robust classification despite residual distortion of the separated source signals. A promising paradigm is to estimate the uncertainty about the separated source signals and to propagate it through the subsequent feature extraction and classification stages. We argue that variational Bayesian (VB) inference offers a mathematically rigorous way of deriving uncertainty estimators, which contrasts with state-of-the-art estimators based on heuristics or on maximum likelihood (ML) estimation. We propose a general VB source separation algorithm, which makes it possible to jointly exploit spatial and spectral models of the sources. This algorithm achieves 6% and 5% relative error reduction compared to ML uncertainty estimation on the CHiME noise-robust speaker identification and speech recognition benchmarks, respectively, and it opens the way for more complex VB approximations of uncertainty.Dans cet article, nous considérons le problème de l'extraction des descripteurs de chaque source dans un enregistrement audio multi-sources à l'aide d'un algorithme général de séparation de sources. La difficulté consiste à estimer l'incertitude sur les sources et à la propager aux descripteurs, afin de les estimer de façon robuste en dépit des erreurs de séparation. Les méthodes de l'état de l'art estiment l'incertitude de façon heuristique, tandis que nous proposons d'intégrer sur les paramètres de l'algorithme de séparation de sources. Nous décrivons dans ce but une méthode d'inférence variationnelle bayésienne pour l'estimation de la distribution a posteriori des sources et nous calculons ensuite l'espérance des descripteurs par propagation de l'incertitude selon la méthode d'identification des moments. Nous évaluons la précision des descripteurs en terme d'erreur quadratique moyenne et conduisons des expériences de reconnaissance du locuteur afin d'observer la performance qui en découle pour un problème réel. Dans les deux cas, la méthode proposée donne les meilleurs résultats
Recommended from our members
Neurobiology of incremental speech comprehension
Understanding spoken language requires the rapid transition from perceptual processing of the auditory input through a variety of cognitive processes involved in constructing the mental representation of the message that the speaker is intending to convey. Listeners carry out these complex processes very rapidly and accurately as they hear each word incrementally unfolding in a sentence. However, little is known about the specific spatiotemporal patterning of this wide range of incremental processing operations that underpin the dynamic transitions from the speech input to the development of a meaning interpretation of an utterance. This thesis aims to address this set of issues by investigating the spatiotemporal dynamics of brain activity as spoken sentences unfold over time in order to illuminate the neurocomputational properties of the human language processing system and determine how the representation of a spoken sentence develops incrementally as each upcoming word is heard.
Using a novel application of multidimensional probabilistic modelling combined with models from computational linguistics, I developed models of a variety of computational processes associated with accessing and processing the syntactic and semantic properties of sentences and tested these models at various points as sentences unfolded over time. Since a wide range of incremental processes occur very rapidly during speech comprehension, it is crucial to keep track of the temporal dynamics of the neural computations involved. To do this, I used combined electroencephalography and magnetoencephalography (EMEG) to record neural activity with millisecond resolution and analyzed the recordings in source space using univariate and/or multivariate approaches. The results confirm the value of this combination of methods in examining the properties of incremental speech processing. My findings corroborate the predictive nature of human speech comprehension and demonstrate that the effects of early semantic constraint are not dependent on explicit syntactic knowledge
The neuro-cognitive representation of word meaning resolved in space and time.
One of the core human abilities is that of interpreting symbols. Prompted with a perceptual stimulus devoid of any intrinsic meaning, such as a written word, our brain can access a complex multidimensional representation, called semantic representation, which corresponds to its meaning. Notwithstanding decades of neuropsychological and neuroimaging work on the cognitive and neural substrate of semantic representations, many questions are left unanswered. The research in this dissertation attempts to unravel one of them: are the neural substrates of different components of concrete word meaning dissociated?
In the first part, I review the different theoretical positions and empirical findings on the cognitive and neural correlates of semantic representations. I highlight how recent methodological advances, namely the introduction of multivariate methods for the analysis of distributed patterns of brain activity, broaden the set of hypotheses that can be empirically tested. In particular, they allow the exploration of the representational geometries of different brain areas, which is instrumental to the understanding of where and when the various dimensions of the semantic space are activated in the brain. Crucially, I propose an operational distinction between motor-perceptual dimensions (i.e., those attributes of the objects referred to by the words that are perceived through the senses) and conceptual ones (i.e., the information that is built via a complex integration of multiple perceptual features).
In the second part, I present the results of the studies I conducted in order to investigate the automaticity of retrieval, topographical organization, and temporal dynamics of motor-perceptual and conceptual dimensions of word meaning. First, I show how the representational spaces retrieved with different behavioral and corpora-based methods (i.e., Semantic Distance Judgment, Semantic Feature Listing, WordNet) appear to be highly correlated and overall consistent within and across subjects. Second, I present the results of four priming experiments suggesting that perceptual dimensions of word meaning (such as implied real world size and sound) are recovered in an automatic but task-dependent way during reading. Third, thanks to a functional magnetic resonance imaging experiment, I show a representational shift along the ventral visual path: from perceptual features, preferentially encoded in primary visual areas, to conceptual ones, preferentially encoded in mid and anterior temporal areas. This result indicates that complementary dimensions of the semantic space are encoded in a distributed yet partially dissociated way across the cortex. Fourth, by means of a study conducted with magnetoencephalography, I present evidence of an early (around 200 ms after stimulus onset) simultaneous access to both motor-perceptual and conceptual dimensions of the semantic space thanks to different aspects of the signal: inter-trial phase coherence appears to be key for the encoding of perceptual while spectral power changes appear to support encoding of conceptual dimensions.
These observations suggest that the neural substrates of different components of symbol meaning can be dissociated in terms of localization and of the feature of the signal encoding them, while sharing a similar temporal evolution
Bayesian Modeling and Estimation Techniques for the Analysis of Neuroimaging Data
Brain function is hallmarked by its adaptivity and robustness, arising from underlying neural activity that admits well-structured representations in the temporal, spatial, or spectral domains. While neuroimaging techniques such as Electroencephalography (EEG) and magnetoencephalography (MEG) can record rapid neural dynamics at high temporal resolutions, they face several signal processing challenges that hinder their full utilization in capturing these characteristics of neural activity. The objective of this dissertation is to devise statistical modeling and estimation methodologies that account for the dynamic and structured representations of neural activity and to demonstrate their utility in application to experimentally-recorded data.
The first part of this dissertation concerns spectral analysis of neural data. In order to capture the non-stationarities involved in neural oscillations, we integrate multitaper spectral analysis and state-space modeling in a Bayesian estimation setting. We also present a multitaper spectral analysis method tailored for spike trains that captures the non-linearities involved in neuronal spiking. We apply our proposed algorithms to both EEG and spike recordings, which reveal significant gains in spectral resolution and noise reduction.
In the second part, we investigate cortical encoding of speech as manifested in MEG responses. These responses are often modeled via a linear filter, referred to as the temporal response function (TRF). While the TRFs estimated from the sensor-level MEG data have been widely studied, their cortical origins are not fully understood. We define the new notion of Neuro-Current Response Functions (NCRFs) for simultaneously determining the TRFs and their cortical distribution. We develop an efficient algorithm for NCRF estimation and apply it to MEG data, which provides new insights into the cortical dynamics underlying speech processing.
Finally, in the third part, we consider the inference of Granger causal (GC) influences in high-dimensional time series models with sparse coupling. We consider a canonical sparse bivariate autoregressive model and define a new statistic for inferring GC influences, which we refer to as the LASSO-based Granger Causal (LGC) statistic. We establish non-asymptotic guarantees for robust identification of GC influences via the LGC statistic. Applications to simulated and real data demonstrate the utility of the LGC statistic in robust GC identification
Dagstuhl News January - December 2011
"Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic
- …