Search CORE

49,758 research outputs found

Neural mechanisms for voice recognition

Author: Andics A.
Gál V.
McQueen J.
Petersson K.
Rudas G.
Vidnyánszky Z.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

We investigated neural mechanisms that support voice recognition in a training paradigm with fMRI. The same listeners were trained on different weeks to categorize the mid-regions of voice-morph continua as an individual's voice. Stimuli implicitly defined a voice-acoustics space, and training explicitly defined a voice-identity space. The predefined centre of the voice category was shifted from the acoustic centre each week in opposite directions, so the same stimuli had different training histories on different tests. Cortical sensitivity to voice similarity appeared over different time-scales and at different representational stages. First, there were short-term adaptation effects: Increasing acoustic similarity to the directly preceding stimulus led to haemodynamic response reduction in the middle/posterior STS and in right ventrolateral prefrontal regions. Second, there were longer-term effects: Response reduction was found in the orbital/insular cortex for stimuli that were most versus least similar to the acoustic mean of all preceding stimuli, and, in the anterior temporal pole, the deep posterior STS and the amygdala, for stimuli that were most versus least similar to the trained voice-identity category mean. These findings are interpreted as effects of neural sharpening of long-term stored typical acoustic and category-internal values. The analyses also reveal anatomically separable voice representations: one in a voice-acoustics space and one in a voice-identity space. Voice-identity representations flexibly followed the trained identity shift, and listeners with a greater identity effect were more accurate at recognizing familiar voices. Voice recognition is thus supported by neural voice spaces that are organized around flexible ‘mean voice’ representations

MPG.PuRe

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Author: Bengio Samy
Chorowski Jan
Saurous Rif A.
Weiss Ron J.
Publication venue
Publication date: 08/03/2018
Field of study

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances. Similar to image texture synthesis and neural style transfer, the system works by optimizing a cost function with respect to the input waveform samples. To this end we use a differentiable mel-filterbank feature extraction pipeline and train a convolutional CTC speech recognition network. Our system is able to extract speaker characteristics from very limited amounts of target speaker data, as little as a few seconds, and can be used to generate realistic speech babble or reconstruct an utterance in a different voice.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

A unified coding strategy for processing faces and voices

Author: Belin P.
Yovel G.
Publication venue: 'Elsevier BV'
Publication date: 01/06/2013
Field of study

Both faces and voices are rich in socially-relevant information, which humans are remarkably adept at extracting, including a person's identity, age, gender, affective state, personality, etc. Here, we review accumulating evidence from behavioral, neuropsychological, electrophysiological, and neuroimaging studies which suggest that the cognitive and neural processing mechanisms engaged by perceiving faces or voices are highly similar, despite the very different nature of their sensory input. The similarity between the two mechanisms likely facilitates the multi-modal integration of facial and vocal information during everyday social interactions. These findings emphasize a parsimonious principle of cerebral organization, where similar computational problems in different modalities are solved using similar solutions

Elsevier - Publisher Connector

Who is that? Brain networks and mechanisms for identifying individuals

Author: Abel Taylor J.
Kayser Christoph
Logothetis Nikos K.
Perrodin Catherine
Petkov Christopher I.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Social animals can identify conspecifics by many forms of sensory input. However, whether the neuronal computations that support this ability to identify individuals rely on modality-independent convergence or involve ongoing synergistic interactions along the multiple sensory streams remains controversial. Direct neuronal measurements at relevant brain sites could address such questions, but this requires better bridging the work in humans and animal models. Here, we overview recent studies in nonhuman primates on voice and face identity-sensitive pathways and evaluate the correspondences to relevant findings in humans. This synthesis provides insights into converging sensory streams in the primate anterior temporal lobe (ATL) for identity processing. Furthermore, we advance a model and suggest how alternative neuronal mechanisms could be tested

Elsevier - Publisher Connector

Crossref

PubMed Central

Publications at Bielefeld University

Enlighten

The University of Manchester - Institutional Repository

Similarities in face and voice cerebral processing

Author: Belin Pascal
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

In this short paper I illustrate by a few selected examples several compelling similarities in the functional organization of face and voice cerebral processing: (1) Presence of cortical areas selective to face or voice stimuli, also observed in non-human primates, and causally related to perception; (2) Coding of face or voice identity using a “norm-based” scheme; (3) Personality inferences from faces and voices in a same Trustworthiness–Dominance “social space”

Crossref

Enlighten

Visual mechanisms for voice‐identity recognition flexibly adjust to auditory noise level

Author: Maguinness C.
von Kriegstein K.
Publication venue: 'Wiley'
Publication date: 27/05/2021
Field of study

Recognising the identity of voices is a key ingredient of communication. Visual mechanisms support this ability: recognition is better for voices previously learned with their corresponding face (compared to a control condition). This so-called 'face-benefit' is supported by the fusiform face area (FFA), a region sensitive to facial form and identity. Behavioural findings indicate that the face-benefit increases in noisy listening conditions. The neural mechanisms for this increase are unknown. Here, using functional magnetic resonance imaging, we examined responses in face-sensitive regions while participants recognised the identity of auditory-only speakers (previously learned by face) in high (SNR -4 dB) and low (SNR +4 dB) levels of auditory noise. We observed a face-benefit in both noise levels, for most participants (16 of 21). In high-noise, the recognition of face-learned speakers engaged the right posterior superior temporal sulcus motion-sensitive face area (pSTS-mFA), a region implicated in the processing of dynamic facial cues. The face-benefit in high-noise also correlated positively with increased functional connectivity between this region and voice-sensitive regions in the temporal lobe in the group of 16 participants with a behavioural face-benefit. In low-noise, the face-benefit was robustly associated with increased responses in the FFA and to a lesser extent the right pSTS-mFA. The findings highlight the remarkably adaptive nature of the visual network supporting voice-identity recognition in auditory-only listening conditions

PubMed Central

MPG.PuRe

Attention-Based Models for Text-Dependent Speaker Verification

Author: Chowdhury F A Rezaur Rahman
Moreno Ignacio Lopez
Wan Li
Wang Quan
Publication venue
Publication date: 31/01/2018
Field of study

Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependent speaker recognition system. We explore different topologies and their variants of the attention layer, and compare different pooling methods on the attention weights. Ultimately, we show that attention-based models can improves the Equal Error Rate (EER) of our speaker verification system by relatively 14% compared to our non-attention LSTM baseline model.Comment: Submitted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Event-Related Potentials and Emotion Processing in Child Psychopathology

Author: Adolphs
Americal Psychiatric Association [APA]
Apicella
Bar-Haim
Batty
Batty
Batty
Bentin
Bisch
Blair
Blair
Blakemore
Blasi
Blau
Bostanov
Brothers
Bruce
Bunford
Campanella
Charest
Chin-hsuan
Chronaki
Chronaki
Chronaki
Chronaki
Dadds
Dawel
Dawson
De Haan
De Haan
DeCicco
Eimer
Eimer
Fairchild
Frith
Grossmann
Grossmann
Hadwin
Hajcak
Halit
Halit
Haxby
Herrmann
Hung
Johnson
Johnson
Jones
Jones
Kestenbaum
Korpilahti
Kret
Kujawa
Kujawa
Köchel
Köchel
Lerner
Marsh
McPartland
Nelson
Oades
Paul
Paulmann
Pelc
Pine
Polanczyk
Proudfit
Rapport
Schirmer
Schupp
Shaw
Solomon
Sonuga-Barke
Stavropoulos
Taylor
Tye
Uekermann
Uljarevic
Vlamings
Wagner
Wang
Webb
Williams
Yuill
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

In recent years there has been increasing interest in the neural mechanisms underlying altered emotional processes in children and adolescents with psychopathology. This review provides a brief overview of the most up-to-date findings in the field of Event-Related Potentials (ERPs) to facial and vocal emotional expressions in the most common child psychopathological conditions. In regards to externalising behaviour (i.e. ADHD, CD), ERP studies show enhanced early components to anger, reflecting enhanced sensory processing, followed by reductions in later components to anger, reflecting reduced cognitive-evaluative processing. In regards to internalising behaviour, research supports models of increased processing of threat stimuli especially at later more elaborate and effortful stages. Finally, in autism spectrum disorders abnormalities have been observed at early visual-perceptual stages of processing. An affective neuroscience framework for understanding child psychopathology can be valuable in elucidating underlying mechanisms and inform preventive intervention

CLoK

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository