42 research outputs found

    LIPSFUS: A neuromorphic dataset for audio-visual sensory fusion of lip reading

    Get PDF

    RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network

    Full text link
    Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision sensing Network (RN-Net), based on simple convolution layers integrated with dynamic temporal encoding reservoirs for local and global spatiotemporal feature detection with low hardware and training costs. The RN-Net allows efficient processing of asynchronous temporal features, and achieves the highest accuracy of 99.2% for DVS128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size. By leveraging the internal device and circuit dynamics, asynchronous temporal feature encoding can be implemented at very low hardware cost without preprocessing and dedicated memory and arithmetic units. The use of simple DNN blocks and standard backpropagation-based training rules further reduces implementation costs.Comment: 12 pages, 5 figures, 4 table

    A Visionary Approach to Listening: Determining The Role Of Vision In Auditory Scene Analysis

    Get PDF
    To recognize and understand the auditory environment, the listener must first separate sounds that arise from different sources and capture each event. This process is known as auditory scene analysis. The aim of this thesis is to investigate whether and how visual information can influence auditory scene analysis. The thesis consists of four chapters. Firstly, I reviewed the literature to give a clear framework about the impact of visual information on the analysis of complex acoustic environments. In chapter II, I examined psychophysically whether temporal coherence between auditory and visual stimuli was sufficient to promote auditory stream segregation in a mixture. I have found that listeners were better able to report brief deviants in an amplitude modulated target stream when a visual stimulus changed in size in a temporally coherent manner than when the visual stream was coherent with the non-target auditory stream. This work demonstrates that temporal coherence between auditory and visual features can influence the way people analyse an auditory scene. In chapter III, the integration of auditory and visual features in auditory cortex was examined by recording neuronal responses in awake and anaesthetised ferret auditory cortex in response to the modified stimuli used in Chapter II. I demonstrated that temporal coherence between auditory and visual stimuli enhances the neural representation of a sound and influences which sound a neuron represents in a sound mixture. Visual stimuli elicited reliable changes in the phase of the local field potential which provides mechanistic insight into this finding. Together these findings provide evidence that early cross modal integration underlies the behavioural effects in chapter II. Finally, in chapter IV, I investigated whether training can influence the ability of listeners to utilize visual cues for auditory stream analysis and showed that this ability improved by training listeners to detect auditory-visual temporal coherence

    Autism: A “Critical Period” Disorder?

    Get PDF
    Cortical circuits in the brain are refined by experience during critical periods early in postnatal life. Critical periods are regulated by the balance of excitatory and inhibitory (E/I) neurotransmission in the brain during development. There is now increasing evidence of E/I imbalance in autism, a complex genetic neurodevelopmental disorder diagnosed by abnormal socialization, impaired communication, and repetitive behaviors or restricted interests. The underlying cause is still largely unknown and there is no fully effective treatment or cure. We propose that alteration of the expression and/or timing of critical period circuit refinement in primary sensory brain areas may significantly contribute to autistic phenotypes, including cognitive and behavioral impairments. Dissection of the cellular and molecular mechanisms governing well-established critical periods represents a powerful tool to identify new potential therapeutic targets to restore normal plasticity and function in affected neuronal circuits

    Representation of Sounds in Auditory Cortex of Awake Rats (Dissertation)

    Get PDF
    This thesis is divided into six chapters (following Introduction). Each chapter was intended to be self-contained, so they do not have to be read in the order they are presented. 

Second chapter (Sec. 2) contains a detailed description of experimental techniques: surgery, recording, and training techniques we used in awake head-fixed rats. We have also included a detailed description of all sets of stimuli we used to probe neurons, analytical methods used to analyze data, and description of computational models used in other parts of the thesis.

Third chapter (Sec. 3) focuses on description of single-neuron responses in primary auditory cortex of awake head-fixed rats. The primary emphasis of this part is on the sparse representation of various auditory stimuli we used to probe neurons, and the heterogeneity of responses of single neurons. To characterize population responses to sound in the auditory cortex we asked the question "What is the typical response to acoustic stimuli?" instead of what is usually asked "What is the stimulus that evokes a response?" We found that the population response was sparse, with many unresponsive neurons. In addition, the responsive neurons showed a great variety of responses. This heterogeneity of neuronal responses ("response zoo," courtesy of Anthony M. Zádor) was, however, surprisingly well characterized by lognormal distribution of firing rates. The observation that firing rates in awake auditory cortex were lognormally distributed was even more interesting given the observation of lognormal distribution of synaptic weights in the cerebral cortex.

The fourth chapter (Sec. 4) focuses on mechanisms which could give rise to lognormal distribution of firing rates, as well as synaptic weights. We proposed specific types of correlations among synaptic connections, and formulated a multiplicative learning rule which led to the observed distributions. We were also able to characterize intracellular activity of neurons in awake auditory cortex. 

The fifth chapter (Sec. 5) contains analysis of so-called up and down states in awake auditory cortex. We show that up and down states--the "signature" subthreshold dynamics so often described in various cortical areas of anesthetized animals--were rare in the primary auditory cortex of awake rats, instead, subthreshold dynamics was consisted of brief, infrequent fluctuations of membrane potential.
The experiments described and analyzed in chapters 2­--4 were conducted in naïve awake rats. As behavior or attention can influence neuronal activity even in primary sensory areas, we developed a setup for head-fixed behavior. 

In the sixth chapter (Sec. 6) we describe the sound discrimination task we have used to study behavior in head-fixed rats. We present a comparison of basic behavioral parameters between restrained and unrestrained rats, as well as evidence of nonauditory modulations of single neuron activity in auditory cortex

    Silent Speech Interfaces for Speech Restoration: A Review

    Get PDF
    This work was supported in part by the Agencia Estatal de Investigacion (AEI) under Grant PID2019-108040RB-C22/AEI/10.13039/501100011033. The work of Jose A. Gonzalez-Lopez was supported in part by the Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship (IJCI-2017-32926).This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.Agencia Estatal de Investigacion (AEI) PID2019-108040RB-C22/AEI/10.13039/501100011033Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship IJCI-2017-3292

    CORTICAL DYNAMICS OF AUDITORY-VISUAL SPEECH: A FORWARD MODEL OF MULTISENSORY INTEGRATION.

    Get PDF
    In noisy settings, seeing the interlocutor's face helps to disambiguate what is being said. For this to happen, the brain must integrate auditory and visual information. Three major problems are (1) bringing together separate sensory streams of information, (2) extracting auditory and visual speech information, and (3) identifying this information as a unified auditory-visual percept. In this dissertation, a new representational framework for auditory visual (AV) speech integration is offered. The experimental work (psychophysics and electrophysiology (EEG)) suggests specific neural mechanisms for solving problems (1), (2), and (3) that are consistent with a (forward) 'analysis-by-synthesis' view of AV speech integration. In Chapter I, multisensory perception and integration are reviewed. A unified conceptual framework serves as background for the study of AV speech integration. In Chapter II, psychophysics testing the perception of desynchronized AV speech inputs show the existence of a ~250ms temporal window of integration in AV speech integration. In Chapter III, an EEG study shows that visual speech modulates early on the neural processing of auditory speech. Two functionally independent modulations are (i) a ~250ms amplitude reduction of auditory evoked potentials (AEPs) and (ii) a systematic temporal facilitation of the same AEPs as a function of the saliency of visual speech. In Chapter IV, an EEG study of desynchronized AV speech inputs shows that (i) fine-grained (gamma, ~25ms) and (ii) coarse-grained (theta, ~250ms) neural mechanisms simultaneously mediate the processing of AV speech. In Chapter V, a new illusory effect is proposed, where non-speech visual signals modify the perceptual quality of auditory objects. EEG results show very different patterns of activation as compared to those observed in AV speech integration. An MEG experiment is subsequently proposed to test hypotheses on the origins of these differences. In Chapter VI, the 'analysis-by-synthesis' model of AV speech integration is contrasted with major speech theories. From a Cognitive Neuroscience perspective, the 'analysis-by-synthesis' model is argued to offer the most sensible representational system for AV speech integration. This thesis shows that AV speech integration results from both the statistical nature of stimulation and the inherent predictive capabilities of the nervous system

    States and sequences of paired subspace ideals and their relationship to patterned brain function

    Full text link
    It is found here that the state of a network of coupled ordinary differential equations is partially localizable through a pair of contractive ideal subspaces, chosen from dual complete lattices related to the synchrony and synchronization of cells within the network. The first lattice is comprised of polydiagonal subspaces, corresponding to synchronous activity patterns that arise from functional equivalences of cell receptive fields. This lattice is dual to a transdiagonal subspace lattice ordering subspaces transverse to these network-compatible synchronies. Combinatorial consideration of contracting polydiagonal and transdiagonal subspace pairs yields a rich array of dynamical possibilities for structured networks. After proving that contraction commutes with the lattice ordering, it is shown that subpopulations of cells are left at fixed potentials when pairs of contracting subspaces span the cells' local coordinates - a phenomenon named glyph formation here. Treatment of mappings between paired states then leads to a theory of network-compatible sequence generation. The theory's utility is illustrated with examples ranging from the construction of a minimal circuit for encoding a simple phoneme to a model of the primary visual cortex including high-dimensional environmental inputs, laminar speficicity, spiking discontinuities, and time delays. In this model, glyph formation and dissolution provide one account for an unexplained anomaly in electroencephalographic recordings under periodic flicker, where stimulus frequencies differing by as little as 1 Hz generate responses varying by an order of magnitude in alpha-band spectral power. Further links between coupled-cell systems and neural dynamics are drawn through a review of synchronization in the brain and its relationship to aggregate observables, focusing again on electroencephalography. Given previous theoretical work relating the geometry of visual hallucinations to symmetries in visual cortex, periodic perturbation of the visual system along a putative symmetry axis is hypothesized to lead to a greater concentration of harmonic spectral energy than asymmetric perturbations; preliminary experimental evidence affirms this hypothesis. To conclude, connections drawn between dynamics, sensation, and behavior are distilled to seven hypotheses, and the potential medical uses of the theory are illustrated with a lattice depiction of ketamine xylazine anaesthesia and a reinterpretation of hemifield neglect
    corecore