20 research outputs found

    Neural processing of natural sounds

    Full text link
    Natural sounds include animal vocalizations, environmental sounds such as wind, water and fire noises and non-vocal sounds made by animals and humans for communication. These natural sounds have characteristic statistical properties that make them perceptually salient and that drive auditory neurons in optimal regimes for information transmission.Recent advances in statistics and computer sciences have allowed neuro-physiologists to extract the stimulus-response function of complex auditory neurons from responses to natural sounds. These studies have shown a hierarchical processing that leads to the neural detection of progressively more complex natural sound features and have demonstrated the importance of the acoustical and behavioral contexts for the neural responses.High-level auditory neurons have shown to be exquisitely selective for conspecific calls. This fine selectivity could play an important role for species recognition, for vocal learning in songbirds and, in the case of the bats, for the processing of the sounds used in echolocation. Research that investigates how communication sounds are categorized into behaviorally meaningful groups (e.g. call types in animals, words in human speech) remains in its infancy.Animals and humans also excel at separating communication sounds from each other and from background noise. Neurons that detect communication calls in noise have been found but the neural computations involved in sound source separation and natural auditory scene analysis remain overall poorly understood. Thus, future auditory research will have to focus not only on how natural sounds are processed by the auditory system but also on the computations that allow for this processing to occur in natural listening situations.The complexity of the computations needed in the natural hearing task might require a high-dimensional representation provided by ensemble of neurons and the use of natural sounds might be the best solution for understanding the ensemble neural code

    Understanding Auditory Spectro-Temporal Receptive Fields and Their Changes with Input Statistics by Efficient Coding Principles

    Get PDF
    Spectro-temporal receptive fields (STRFs) have been widely used as linear approximations to the signal transform from sound spectrograms to neural responses along the auditory pathway. Their dependence on statistical attributes of the stimuli, such as sound intensity, is usually explained by nonlinear mechanisms and models. Here, we apply an efficient coding principle which has been successfully used to understand receptive fields in early stages of visual processing, in order to provide a computational understanding of the STRFs. According to this principle, STRFs result from an optimal tradeoff between maximizing the sensory information the brain receives, and minimizing the cost of the neural activities required to represent and transmit this information. Both terms depend on the statistical properties of the sensory inputs and the noise that corrupts them. The STRFs should therefore depend on the input power spectrum and the signal-to-noise ratio, which is assumed to increase with input intensity. We analytically derive the optimal STRFs when signal and noise are approximated as Gaussians. Under the constraint that they should be spectro-temporally local, the STRFs are predicted to adapt from being band-pass to low-pass filters as the input intensity reduces, or the input correlation becomes longer range in sound frequency or time. These predictions qualitatively match physiological observations. Our prediction as to how the STRFs should be determined by the input power spectrum could readily be tested, since this spectrum depends on the stimulus ensemble. The potentials and limitations of the efficient coding principle are discussed

    Computational and Perceptual Characterization of Auditory Attention

    Get PDF
    Humans are remarkably capable at making sense of a busy acoustic environment in real-time, despite the constant cacophony of sounds reaching our ears. Attention is a key component of the system that parses sensory input, allocating limited neural resources to elements with highest informational value to drive cognition and behavior. The focus of this thesis is the perceptual, neural, and computational characterization of auditory attention. Pioneering studies exploring human attention to natural scenes came from the visual domain, spawning a number of hypotheses on how attention operates among the visual pathway, as well as a considerable amount of computational work that attempt to model human perception. Comparatively, our understanding of auditory attention is yet very elementary, particularly pertaining to attention automatically drawn to salient sounds in the environment, such as a loud explosion. In this work, we explore how human perception is affected by the saliency of sound, characterized across a variety of acoustic features, such as pitch, loudness, and timbre. Insight from psychoacoustical data is complemented with neural measures of attention recorded directly from the brain using electroencephalography (EEG). A computational model of attention is presented, tracking the statistical regularities of incoming sound among a high-dimensional feature space to build predictions of future feature values. The model determines salient time points that will attract attention by comparing its predictions to the observed sound features. The high degree of agreement between the model and human experimental data suggests predictive coding as a potential mechanism of attention in the auditory pathway. We investigate different modes of volitional attention to natural acoustic scenes with a "cocktail-party" simulation. We argue that the auditory system can direct attention in at least three unique ways (globally, based on features, and based on objects) and that perception can be altered depending on how attention is deployed. Further, we illustrate how the saliency of sound affects the various modes of attention. The results of this work improve our understanding of auditory attention, highlighting the temporally evolving nature of sound as a significant distinction between audition and vision, with a focus on using natural scenes that engage the full capability of human attention

    Characterizing and comparing acoustic representations in convolutional neural networks and the human auditory system

    Full text link
    Le traitement auditif dans le cerveau humain et dans les systèmes informatiques consiste en une cascade de transformations représentationnelles qui extraient et réorganisent les informations pertinentes pour permettre l'exécution des tâches. Cette thèse s'intéresse à la nature des représentations acoustiques et aux principes de conception et d'apprentissage qui soutiennent leur développement. Les objectifs scientifiques sont de caractériser et de comparer les représentations auditives dans les réseaux de neurones convolutionnels profonds (CNN) et la voie auditive humaine. Ce travail soulève plusieurs questions méta-scientifiques sur la nature du progrès scientifique, qui sont également considérées. L'introduction passe en revue les connaissances actuelles sur la voie auditive des mammifères et présente les concepts pertinents de l'apprentissage profond. Le premier article soutient que les questions philosophiques les plus pressantes à l'intersection de l'intelligence artificielle et biologique concernent finalement la définition des phénomènes à expliquer et ce qui constitue des explications valables de tels phénomènes. Je surligne les théories pertinentes de l'explication scientifique que j’espére fourniront un échafaudage pour de futures discussions. L'article 2 teste un modèle populaire de cortex auditif basé sur des modulations spectro-temporelles. Nous constatons qu'un modèle linéaire entraîné uniquement sur les réponses BOLD aux ondulations dynamiques simples (contenant seulement une fréquence fondamentale, un taux de modulation temporelle et une échelle spectrale) peut se généraliser pour prédire les réponses aux mélanges de deux ondulations dynamiques. Le troisième article caractérise la spécificité linguistique des couches CNN et explore l'effet de l'entraînement figé et des poids aléatoires. Nous avons observé trois régions distinctes de transférabilité: (1) les deux premières couches étaient entièrement transférables, (2) les couches 2 à 8 étaient également hautement transférables, mais nous avons trouvé évidence de spécificité de la langue, (3) les couches suivantes entièrement connectées étaient plus spécifiques à la langue mais pouvaient être adaptées sur la langue cible. Dans l'article 4, nous utilisons l'analyse de similarité pour constater que la performance supérieure de l'entraînement figé obtenues à l'article 3 peuvent être attribuées aux différences de représentation dans l'avant-dernière couche: la deuxième couche entièrement connectée. Nous analysons également les réseaux aléatoires de l'article 3, dont nous concluons que la forme représentationnelle est doublement contrainte par l'architecture et la forme de l'entrée et de la cible. Pour tester si les CNN acoustiques apprennent une hiérarchie de représentation similaire à celle du système auditif humain, le cinquième article compare l'activité des réseaux «freeze trained» de l'article 3 à l'activité IRMf 7T dans l'ensemble du système auditif humain. Nous ne trouvons aucune évidence d'une hiérarchie de représentation partagée et constatons plutôt que tous nos régions auditifs étaient les plus similaires à la première couche entièrement connectée. Enfin, le chapitre de discussion passe en revue les mérites et les limites d'une approche d'apprentissage profond aux neurosciences dans un cadre de comparaison de modèles. Ensemble, ces travaux contribuent à l'entreprise naissante de modélisation du système auditif avec des réseaux de neurones et constituent un petit pas vers une science unifiée de l'intelligence qui étudie les phénomènes qui se manifestent dans l'intelligence biologique et artificielle.Auditory processing in the human brain and in contemporary machine hearing systems consists of a cascade of representational transformations that extract and reorganize relevant information to enable task performance. This thesis is concerned with the nature of acoustic representations and the network design and learning principles that support their development. The primary scientific goals are to characterize and compare auditory representations in deep convolutional neural networks (CNNs) and the human auditory pathway. This work prompts several meta-scientific questions about the nature of scientific progress, which are also considered. The introduction reviews what is currently known about the mammalian auditory pathway and introduces the relevant concepts in deep learning.The first article argues that the most pressing philosophical questions at the intersection of artificial and biological intelligence are ultimately concerned with defining the phenomena to be explained and with what constitute valid explanations of such phenomena. I highlight relevant theories of scientific explanation which we hope will provide scaffolding for future discussion. Article 2 tests a popular model of auditory cortex based on frequency-specific spectrotemporal modulations. We find that a linear model trained only on BOLD responses to simple dynamic ripples (containing only one fundamental frequency, temporal modulation rate, and spectral scale) can generalize to predict responses to mixtures of two dynamic ripples. Both the third and fourth article investigate how CNN representations are affected by various aspects of training. The third article characterizes the language specificity of CNN layers and explores the effect of freeze training and random weights. We observed three distinct regions of transferability: (1) the first two layers were entirely transferable between languages, (2) layers 2--8 were also highly transferable but we found some evidence of language specificity, (3) the subsequent fully connected layers were more language specific but could be successfully finetuned to the target language. In Article 4, we use similarity analysis to find that the superior performance of freeze training achieved in Article 3 can be largely attributed to representational differences in the penultimate layer: the second fully connected layer. We also analyze the random networks from Article 3, from which we conclude that representational form is doubly constrained by architecture and the form of the input and target. To test whether acoustic CNNs learn a similar representational hierarchy as that of the human auditory system, the fifth article presents a similarity analysis to compare the activity of the freeze trained networks from Article 3 to 7T fMRI activity throughout the human auditory system. We find no evidence of a shared representational hierarchy and instead find that all of our auditory regions were most similar to the first fully connected layer. Finally, the discussion chapter reviews the merits and limitations of a deep learning approach to neuroscience in a model comparison framework. Together, these works contribute to the nascent enterprise of modeling the auditory system with neural networks and constitute a small step towards a unified science of intelligence that studies the phenomena that are exhibited in both biological and artificial intelligence

    Models of Neuronal Stimulus-Response Functions: Elaboration, Estimation, and Evaluation

    Get PDF
    Rich, dynamic, and dense sensory stimuli are encoded within the nervous system by the time-varying activity of many individual neurons. A fundamental approach to understanding the nature of the encoded representation is to characterize the function that relates the moment-by-moment firing of a neuron to the recent history of a complex sensory input. This review provides a unifying and critical survey of the techniques that have been brought to bear on this effort thus far—ranging from the classical linear receptive field model to modern approaches incorporating normalization and other nonlinearities. We address separately the structure of the models; the criteria and algorithms used to identify the model parameters; and the role of regularizing terms or “priors.” In each case we consider benefits or drawbacks of various proposals, providing examples for when these methods work and when they may fail. Emphasis is placed on key concepts rather than mathematical details, so as to make the discussion accessible to readers from outside the field. Finally, we review ways in which the agreement between an assumed model and the neuron's response may be quantified. Re-implemented and unified code for many of the methods are made freely available

    Cognitive Analysis of Complex Acoustic Scenes

    Get PDF
    Natural auditory scenes consist of a rich variety of temporally overlapping sounds that originate from multiple sources and locations and are characterized by distinct acoustic features. It is an important biological task to analyze such complex scenes and extract sounds of interest. The thesis addresses this question, also known as the “cocktail party problem” by developing an approach based on analysis of a novel stochastic signal contrary to deterministic narrowband signals used in previous work. This low-level signal, known as the Stochastic Figure-Ground (SFG) stimulus captures the spectrotemporal complexity of natural sound scenes and enables parametric control of stimulus features. In a series of experiments based on this stimulus, I have investigated specific behavioural and neural correlates of human auditory figure-ground segregation. This thesis is presented in seven sections. Chapter 1 reviews key aspects of auditory processing and existing models of auditory segregation. Chapter 2 presents the principles of the techniques used including psychophysics, modeling, functional Magnetic Resonance Imaging (fMRI) and Magnetoencephalography (MEG). Experimental work is presented in the following chapters and covers figure-ground segregation behaviour (Chapter 3), modeling of the SFG stimulus based on a temporal coherence model of auditory perceptual organization (Chapter 4), analysis of brain activity related to detection of salient targets in the SFG stimulus using fMRI (Chapter 5), and MEG respectively (Chapter 6). Finally, Chapter 7 concludes with a general discussion of the results and future directions for research. Overall, this body of work emphasizes the use of stochastic signals for auditory scene analysis and demonstrates an automatic, highly robust segregation mechanism in the auditory system that is sensitive to temporal correlations across frequency channels

    Semantic radical consistency and character transparency effects in Chinese: an ERP study

    Get PDF
    BACKGROUND: This event-related potential (ERP) study aims to investigate the representation and temporal dynamics of Chinese orthography-to-semantics mappings by simultaneously manipulating character transparency and semantic radical consistency. Character components, referred to as radicals, make up the building blocks used dur...postprin

    Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing

    Get PDF
    corecore