Search CORE

34,073 research outputs found

Rapid Visual Categorization is not Guided by Early Salience-Based Selection

Author: Kotseruba Iuliia
Tsotsos John K.
Wloka Calden
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

The current dominant visual processing paradigm in both human and machine research is the feedforward, layered hierarchy of neural-like processing elements. Within this paradigm, visual saliency is seen by many to have a specific role, namely that of early selection. Early selection is thought to enable very fast visual performance by limiting processing to only the most salient candidate portions of an image. This strategy has led to a plethora of saliency algorithms that have indeed improved processing time efficiency in machine algorithms, which in turn have strengthened the suggestion that human vision also employs a similar early selection strategy. However, at least one set of critical tests of this idea has never been performed with respect to the role of early selection in human vision. How would the best of the current saliency models perform on the stimuli used by experimentalists who first provided evidence for this visual processing paradigm? Would the algorithms really provide correct candidate sub-images to enable fast categorization on those same images? Do humans really need this early selection for their impressive performance? Here, we report on a new series of tests of these questions whose results suggest that it is quite unlikely that such an early selection process has any role in human rapid visual categorization.Comment: 22 pages, 9 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Machine learning of visual object categorization: an application of the SUSTAIN model

Author: Cangelosi A
Carmantini GS
Wills AJ
Publication venue: Austin, TX
Publication date: 12/08/2014
Field of study

Formal models of categorization are psychological theories that try to describe the process of categorization in a lawful way, using the language of mathematics. Their mathematical formulation makes it possible for the models to generate precise, quantitative predictions. SUSTAIN (Love, Medin & Gureckis, 2004) is a powerful formal model of categorization that has been used to model a range of human experimental data, describing the process of categorization in terms of an adaptive clustering principle. Love et al. (2004) suggested a possible application of the model in the field of object recognition and categorization. The present study explores this possibility, investigating at the same time the utility of using a formal model of categorization in a typical machine learning task. The image categorization performance of SUSTAIN on a well-known image set is compared with that of a linear Support Vector Machine, confirming the capability of SUSTAIN to perform image categorization with a reasonable accuracy, even if at a rather high computational cost

Plymouth Electronic Archive and Research Library

How active perception and attractor dynamics shape perceptual categorization: A computational model

Author: Akrami
Amit
Amit
Ballard
Barca
Barsalou
Barsalou
Barsalou
Barsalou
Barsalou
Barsalou
Bassett
Bickhard
Bogacz
Chao
Churchland
Cisek
Cisek
Clifford
Coles
Desmurget
Fagioli
Frintrop
Friston
Friston
Geisler
Gigliotta
Giovanni Pezzulo
Gold
Gold
Grush
Hayhoe
Hommel
Hopfield
Itti
Jean Charles Quinton
Jeannerod
Jeannerod
Kawato
Kietzmann
Kilner
Kokinov
Krajbich
Kruschke
Lamberts
Medin
Mirolli
Miyashita
Nelson
Nelson
Nicola Catenacci Volpi
Nolfi
Nosofsky
Olman
O’Regan
Pezzulo
Pezzulo
Pezzulo
Pezzulo
Pezzulo
Pezzulo
Pezzulo
Pezzulo
Quinton
Quinton
Rao
Ratcliff
Ratcliff
Rehder
Resulaj
Rizzolatti
Rojas
Rosch
Rothkopf
Roy
Sakai
Salinas
Sanborn
Schoener
Shadlen
Song
Spivey
Strauss
Tipper
Tosoni
Trabasso
Tuci
Tucker
Usher
Wang
Wolpert
Yarbus
Publication venue: 'Elsevier BV'
Publication date: 23/07/2014
Field of study

We propose a computational model of perceptual categorization that fuses elements of grounded and sensorimotor theories of cognition with dynamic models of decision-making. We assume that category information consists in anticipated patterns of agent–environment interactions that can be elicited through overt or covert (simulated) eye movements, object manipulation, etc. This information is firstly encoded when category information is acquired, and then re-enacted during perceptual categorization. The perceptual categorization consists in a dynamic competition between attractors that encode the sensorimotor patterns typical of each category; action prediction success counts as ‘‘evidence’’ for a given category and contributes to falling into the corresponding attractor. The evidence accumulation process is guided by an active perception loop, and the active exploration of objects (e.g., visual exploration) aims at eliciting expected sensorimotor patterns that count as evidence for the object category. We present a computational model incorporating these elements and describing action prediction, active perception, and attractor dynamics as key elements of perceptual categorizations. We test the model in three simulated perceptual categorization tasks, and we discuss its relevance for grounded and sensorimotor theories of cognition.Peer reviewe

Crossref

HAL Clermont Université

University of Hertfordshire Research Archive

What do we perceive in a glance of a real-world scene?

Author: Iyer Asha
Koch Christof
Li Fei Fei
Perona Pietro
Publication venue: 'Association for Research in Vision and Ophthalmology (ARVO)'
Publication date: 01/01/2007
Field of study

What do we see when we glance at a natural scene and how does it change as the glance becomes longer? We asked naive subjects to report in a free-form format what they saw when looking at briefly presented real-life photographs. Our subjects received no specific information as to the content of each stimulus. Thus, our paradigm differs from previous studies where subjects were cued before a picture was presented and/or were probed with multiple-choice questions. In the first stage, 90 novel grayscale photographs were foveally shown to a group of 22 native-English-speaking subjects. The presentation time was chosen at random from a set of seven possible times (from 27 to 500 ms). A perceptual mask followed each photograph immediately. After each presentation, subjects reported what they had just seen as completely and truthfully as possible. In the second stage, another group of naive individuals was instructed to score each of the descriptions produced by the subjects in the first stage. Individual scores were assigned to more than a hundred different attributes. We show that within a single glance, much object- and scene-level information is perceived by human subjects. The richness of our perception, though, seems asymmetrical. Subjects tend to have a propensity toward perceiving natural scenes as being outdoor rather than indoor. The reporting of sensory- or feature-level information of a scene (such as shading and shape) consistently precedes the reporting of the semantic-level information. But once subjects recognize more semantic-level components of a scene, there is little evidence suggesting any bias toward either scene-level or object-level recognition

CiteSeerX

Caltech Authors

A Discriminative Representation of Convolutional Features for Indoor Scene Recognition

Author: Bennamoun Mohammed
Hayat Munawar
Khan Salman H.
Sohel Ferdous
Togneri Roberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/06/2015
Field of study

Indoor scene recognition is a multi-faceted and challenging problem due to the diverse intra-class variations and the confusing inter-class similarities. This paper presents a novel approach which exploits rich mid-level convolutional features to categorize indoor scenes. Traditionally used convolutional features preserve the global spatial structure, which is a desirable property for general object recognition. However, we argue that this structuredness is not much helpful when we have large variations in scene layouts, e.g., in indoor scenes. We propose to transform the structured convolutional activations to another highly discriminative feature space. The representation in the transformed space not only incorporates the discriminative aspects of the target dataset, but it also encodes the features in terms of the general object categories that are present in indoor scenes. To this end, we introduce a new large-scale dataset of 1300 object categories which are commonly present in indoor scenes. Our proposed approach achieves a significant performance boost over previous state of the art approaches on five major scene classification datasets

arXiv.org e-Print Archive

University of Canberra Research Repository

Research Repository

Detecting Sarcasm in Multimodal Social Platforms

Author: Bamman D.
Davidov D.
Frome A.
Ghosh D.
Gibbs R.
González-Ibánez R.
Kincaid J. P.
Mikolov T.
Riloff E.
Tepperman J.
Tsur O.
Veale T.
Verstraten P.
Wang Z.
You Q.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment. The detection of sarcasm in social media platforms has been applied in the past mainly to textual utterances where lexical indicators (such as interjections and intensifiers), linguistic markers, and contextual information (such as user profiles, or past conversations) were used to detect the sarcastic tone. However, modern social media platforms allow to create multimodal messages where audiovisual content is integrated with the text, making the analysis of a mode in isolation partial. In our work, we first study the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators. Moreover, we propose two different computational frameworks to detect sarcasm that integrate the textual and visual modalities. The first approach exploits visual semantics trained on an external dataset, and concatenates the semantics features with state-of-the-art textual features. The second method adapts a visual neural network initialized with parameters trained on ImageNet to multimodal sarcastic posts. Results show the positive effect of combining modalities for the detection of sarcasm across platforms and methods.Comment: 10 pages, 3 figures, final version published in the Proceedings of ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

Institutional Research Information System University of Turin

View subspaces for indexing and retrieval of 3D models

Author: Dutagaci Helin
Godil Afzal
Sankur Bulent
Yemez Yücel
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 13/05/2011
Field of study

View-based indexing schemes for 3D object retrieval are gaining popularity since they provide good retrieval results. These schemes are coherent with the theory that humans recognize objects based on their 2D appearances. The viewbased techniques also allow users to search with various queries such as binary images, range images and even 2D sketches. The previous view-based techniques use classical 2D shape descriptors such as Fourier invariants, Zernike moments, Scale Invariant Feature Transform-based local features and 2D Digital Fourier Transform coefficients. These methods describe each object independent of others. In this work, we explore data driven subspace models, such as Principal Component Analysis, Independent Component Analysis and Nonnegative Matrix Factorization to describe the shape information of the views. We treat the depth images obtained from various points of the view sphere as 2D intensity images and train a subspace to extract the inherent structure of the views within a database. We also show the benefit of categorizing shapes according to their eigenvalue spread. Both the shape categorization and data-driven feature set conjectures are tested on the PSB database and compared with the competitor view-based 3D shape retrieval algorithmsComment: Three-Dimensional Image Processing (3DIP) and Applications (Proceedings Volume) Proceedings of SPIE Volume: 7526 Editor(s): Atilla M. Baskurt ISBN: 9780819479198 Date: 2 February 201

arXiv.org e-Print Archive

Crossref

Koç University Digital Collections