7,434 research outputs found
Semantic Image Retrieval via Active Grounding of Visual Situations
We describe a novel architecture for semantic image retrieval---in
particular, retrieval of instances of visual situations. Visual situations are
concepts such as "a boxing match," "walking the dog," "a crowd waiting for a
bus," or "a game of ping-pong," whose instantiations in images are linked more
by their common spatial and semantic structure than by low-level visual
similarity. Given a query situation description, our architecture---called
Situate---learns models capturing the visual features of expected objects as
well the expected spatial configuration of relationships among objects. Given a
new image, Situate uses these models in an attempt to ground (i.e., to create a
bounding box locating) each expected component of the situation in the image
via an active search procedure. Situate uses the resulting grounding to compute
a score indicating the degree to which the new image is judged to contain an
instance of the situation. Such scores can be used to rank images in a
collection as part of a retrieval system. In the preliminary study described
here, we demonstrate the promise of this system by comparing Situate's
performance with that of two baseline methods, as well as with a related
semantic image-retrieval system based on "scene graphs.
Abstract Concepts: Sensory-Motor Grounding, Metaphors, and Beyond
Abstract
In the last decade many researchers have obtained evidence for the idea that
cognition shares processing mechanisms with perception and action. Most of
the evidence supporting the grounded cognition framework focused on representations
of concrete concepts, which leaves open the question how abstract
concepts are grounded in sensory-motor processing. One promising idea is
that people simulate concrete situations and introspective experiences to
represent abstract concepts [Barsalou, L. W., & Wiemer-Hastings, K. (2005).
Situating abstract concepts. In D. Pecher, & R. A. Zwaan (Eds.), Grounding
cognition: The role of perception and action in memory, language, and thinking
(pp. 129–163). Cambridge: Cambridge University Press.], although this has not
yet been investigated a lot. A second idea, which more researchers have
investigated, is that people use metaphorical mappings from concrete to
abstract concepts [Lakoff, G., & Johnson, M. (1980). Metaphors we live by.
Chicago: Chicago University Press.]. According to this conceptual metaphor
theory, image schemas structure and provide sensory-motor grounding for
abstract concepts. Although there is evidence that people automatically activate
image schemas when they process abstract concepts, we argue that
situations are also needed to fully represent meaning
What working memory is for
Glenberg focuses on conceptualizations that change from
moment to moment, yet he dismisses the concept of working memory
(sect. 4.3), which offers an account of temporary storage and on-line
cognition. This commentary questions whether Glenberg's account
adequately caters for observations of consistent data patterns in
temporary storage of verbal and visuospatial information in healthy
adults and in brain-damaged patients with deficits in temporary
retention.</jats:p
Neurally Implementable Semantic Networks
We propose general principles for semantic networks allowing them to be
implemented as dynamical neural networks. Major features of our scheme include:
(a) the interpretation that each node in a network stands for a bound
integration of the meanings of all nodes and external events the node links
with; (b) the systematic use of nodes that stand for categories or types, with
separate nodes for instances of these types; (c) an implementation of
relationships that does not use intrinsically typed links between nodes.Comment: 32 pages, 12 figure
Similarity learning for person re-identification and semantic video retrieval
Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval.
Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art.
Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos.
Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00
- …