1,512 research outputs found

    Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

    Full text link
    We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference. It firstly learns two separate spaces of speaker-knowledge and speech-stimuli based on a shared feature space, where a new block structure is designed as the building block for all spaces, and then cooperatively solves different tasks. Between the two spaces, information is cast towards each other via a novel cross- and dual-attention mechanism, mimicking the bottom-up and top-down processes of a human's cocktail party effect. It turns out that substantially discriminative and generalizable speaker representations can be learnt in severely interfered conditions via our self-supervised training. The experimental results verify this seeming paradox. The learnt speaker embedding has superior discriminative power than a standard speaker verification method; meanwhile, Tune-In achieves remarkably better speech separation performances in terms of SI-SNRi and SDRi consistently in all test modes, and especially at lower memory and computational consumption, than state-of-the-art benchmark systems.Comment: Accepted in AAAI 202

    Brain Learning, Attention, and Consciousness

    Full text link
    The processes whereby our brains continue to learn about a changing world in a stable fashion throughout life are proposed to lead to conscious experiences. These processes include the learning of top-down expectations, the matching of these expectations against bottom-up data, the focusing of attention upon the expected clusters of information, and the development of resonant states between bottom-up and top-down processes as they reach an attentive consensus between what is expected and what is there in the outside world. It is suggested that all conscious states in the brain are resonant states, and that these resonant states trigger learning of sensory and cognitive representations. The model which summarize these concepts are therefore called Adaptive Resonance Theory, or ART, models. Psychophysical and neurobiological data in support of ART are presented from early vision, visual object recognition, auditory streaming, variable-rate speech perception, somatosensory perception, and cognitive-emotional interactions, among others. It is noted that ART mechanisms seem to be operative at all levels of the visual system, and it is proposed how these mechanisms are realized by known laminar circuits of visual cortex. It is predicted that the same circuit realization of ART mechanisms will be found in the laminar circuits of all sensory and cognitive neocortex. Concepts and data are summarized concerning how some visual percepts may be visibly, or modally, perceived, whereas amoral percepts may be consciously recognized even though they are perceptually invisible. It is also suggested that sensory and cognitive processing in the What processing stream of the brain obey top-down matching and learning laws that arc often complementary to those used for spatial and motor processing in the brain's Where processing stream. This enables our sensory and cognitive representations to maintain their stability a.s we learn more about the world, while allowing spatial and motor representations to forget learned maps and gains that are no longer appropriate as our bodies develop and grow from infanthood to adulthood. Procedural memories are proposed to be unconscious because the inhibitory matching process that supports these spatial and motor processes cannot lead to resonance.Defense Advance Research Projects Agency; Office of Naval Research (N00014-95-1-0409, N00014-95-1-0657); National Science Foundation (IRI-97-20333

    Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech

    No full text
    Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech (NiCLS) produce more pronounced amplitude modulations in noise vs. in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech

    Towards Cognizant Hearing Aids: Modeling of Content, Affect and Attention

    Get PDF

    The directional effect of target position on spatial selective auditory attention

    Get PDF
    Spatial selective auditory attention plays a crucial role in listening in a mixture of competing speech sounds. Previous neuroimaging studies have reported alpha band neural activity modulated by auditory attention, along with the alpha lateralization corresponding to attentional focus. A greater cortical representation of the attended speech envelope compared to the ignored speech envelope was also found, a phenomenon known as \u27neural speech tracking’. However, little is known about the neural activities when attentional focus is directed on speech sounds from behind the listener, even though understanding speech from behind is a common and essential aspect of daily life. The objectives of this study are to investigate the impact of four distinct target positions (left, right, front, and particularly, behind) on spatial selective auditory attention by concurrently assessing 1) spatial selective speech identification, 2) oscillatory alpha-band power, and 3) neural speech tracking. Fifteen young adults with normal hearing (NH) were enrolled in this study (M = 21.40, ages 18-29; 10 females). The selective speech identification task indicated that the target position presented at back was the most challenging condition, followed by the front condition, with the lateral condition being the least demanding. The normalized alpha power was modulated by target position and the power was significantly lateralized to either the left or right side, not the front and back. The parieto-occipital alpha power in front-back configuration was significantly lower than the results for left-right listening configuration and the normalized alpha power in the back condition was significantly higher than in the front condition. The speech tracking function of to-be-attended speech envelope was affected by the direction of ix target stream. The behavioral outcome (selective speech identification) was correlated with parieto-occipital alpha power and neural speech tracking correlation coefficient as neural correlates of auditory attention, but there was no significant correlation between alpha power and neural speech tracking. The results suggest that in addition to existing mechanism theories, it might be necessary to consider how our brain responds depending on the location of the sound in order to interpret the neural correlates and behavioral consequences in a meaningful way as well as a potential application of neural speech tracking in studies on spatial selective hearing
    • …
    corecore