Search CORE

25,663 research outputs found

What are the Visual Features Underlying Rapid Object Recognition?

Author: Crouzet Sébastien M.
Serre Thomas
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

Research progress in machine vision has been very significant in recent years. Robust face detection and identification algorithms are already readily available to consumers, and modern computer vision algorithms for generic object recognition are now coping with the richness and complexity of natural visual scenes. Unlike early vision models of object recognition that emphasized the role of figure-ground segmentation and spatial information between parts, recent successful approaches are based on the computation of loose collections of image features without prior segmentation or any explicit encoding of spatial relations. While these models remain simplistic models of visual processing, they suggest that, in principle, bottom-up activation of a loose collection of image features could support the rapid recognition of natural object categories and provide an initial coarse visual representation before more complex visual routines and attentional mechanisms take place. Focusing on biologically plausible computational models of (bottom-up) pre-attentive visual recognition, we review some of the key visual features that have been described in the literature. We discuss the consistency of these feature-based representations with classical theories from visual psychology and test their ability to account for human performance on a rapid object categorization task

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy

Author: Boccignone Giuseppe
Napoletano Paolo
Tisato Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/04/2015
Field of study

In this paper we shall consider the problem of deploying attention to subsets of the video streams for collating the most relevant data and information of interest related to a given task. We formalize this monitoring problem as a foraging problem. We propose a probabilistic framework to model observer's attentive behavior as the behavior of a forager. The forager, moment to moment, focuses its attention on the most informative stream/camera, detects interesting objects or activities, or switches to a more profitable stream. The approach proposed here is suitable to be exploited for multi-stream video summarization. Meanwhile, it can serve as a preliminary step for more sophisticated video surveillance, e.g. activity and behavior analysis. Experimental results achieved on the UCR Videoweb Activities Dataset, a publicly available dataset, are presented to illustrate the utility of the proposed technique.Comment: Accepted to IEEE Transactions on Image Processin

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Different effects of adding white noise on cognitive performance of sub-, normal and super-attentive school children

Author: Bamford S
Barke Edmund
Helps SK
Soderlund GBW
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Objectives: Noise often has detrimental effects on performance. However, because of the phenomenon of stochastic resonance (SR), auditory white noise (WN) can alter the "signal to noise'' ratio and improve performance. The Moderate Brain Arousal (MBA) model postulates different levels of internal "neural noise'' in individuals with different attentional capacities. This in turn determines the particular WN level most beneficial in each individual case-with one level of WN facilitating poor attenders but hindering super-attentive children. The objective of the present study is to find out if added WN affects cognitive performance differently in children that differ in attention ability. Methods: Participants were teacher-rated super-(N = 25); normal-(N = 29) and sub-attentive (N = 36) children (aged 8 to 10 years). Two non-executive function (EF) tasks (a verbal episodic recall task and a delayed verbal recognition task) and two EF tasks (a visuo-spatial working memory test and a Go-NoGo task) were performed under three WN levels. The non-WN condition was only used to control for potential differences in background noise in the group testing situations. Results: There were different effects of WN on performance in the three groups-adding moderate WN worsened the performance of super-attentive children for both task types and improved EF performance in sub-attentive children. The normal-attentive children's performance was unaffected by WN exposure. The shift from moderate to high levels of WN had little further effect on performance in any group. Significance: The predicted differential effect of WN on performance was confirmed. However, the failure to find evidence for an inverted U function challenges current theories. Alternative explanations are discussed. We propose that WN therapy should be further investigated as a possible non-pharmacological treatment for inattention

Ghent University Academic Bibliography

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Author: Demuynck Kris
Desplanques Brecht
Thienpondt Jenthe
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2020
Field of study

Current speaker verification techniques rely on a neural network to extract speaker representations. The successful x-vector architecture is a Time Delay Neural Network (TDNN) that applies statistics pooling to project variable-length utterances into fixed-length speaker characterizing embeddings. In this paper, we propose multiple enhancements to this architecture based on recent trends in the related fields of face verification and computer vision. Firstly, the initial frame layers can be restructured into 1-dimensional Res2Net modules with impactful skip connections. Similarly to SE-ResNet, we introduce Squeeze-and-Excitation blocks in these modules to explicitly model channel interdependencies. The SE block expands the temporal context of the frame layer by rescaling the channels according to global properties of the recording. Secondly, neural networks are known to learn hierarchical features, with each layer operating on a different level of complexity. To leverage this complementary information, we aggregate and propagate features of different hierarchical levels. Finally, we improve the statistics pooling module with channel-dependent frame attention. This enables the network to focus on different subsets of frames during each of the channel's statistics estimation. The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the VoxCeleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.Comment: proceedings of INTERSPEECH 202

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography