7,434 research outputs found

    Semantic Image Retrieval via Active Grounding of Visual Situations

    Full text link
    We describe a novel architecture for semantic image retrieval---in particular, retrieval of instances of visual situations. Visual situations are concepts such as "a boxing match," "walking the dog," "a crowd waiting for a bus," or "a game of ping-pong," whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similarity. Given a query situation description, our architecture---called Situate---learns models capturing the visual features of expected objects as well the expected spatial configuration of relationships among objects. Given a new image, Situate uses these models in an attempt to ground (i.e., to create a bounding box locating) each expected component of the situation in the image via an active search procedure. Situate uses the resulting grounding to compute a score indicating the degree to which the new image is judged to contain an instance of the situation. Such scores can be used to rank images in a collection as part of a retrieval system. In the preliminary study described here, we demonstrate the promise of this system by comparing Situate's performance with that of two baseline methods, as well as with a related semantic image-retrieval system based on "scene graphs.

    Abstract Concepts: Sensory-Motor Grounding, Metaphors, and Beyond

    Get PDF
    Abstract In the last decade many researchers have obtained evidence for the idea that cognition shares processing mechanisms with perception and action. Most of the evidence supporting the grounded cognition framework focused on representations of concrete concepts, which leaves open the question how abstract concepts are grounded in sensory-motor processing. One promising idea is that people simulate concrete situations and introspective experiences to represent abstract concepts [Barsalou, L. W., & Wiemer-Hastings, K. (2005). Situating abstract concepts. In D. Pecher, & R. A. Zwaan (Eds.), Grounding cognition: The role of perception and action in memory, language, and thinking (pp. 129–163). Cambridge: Cambridge University Press.], although this has not yet been investigated a lot. A second idea, which more researchers have investigated, is that people use metaphorical mappings from concrete to abstract concepts [Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: Chicago University Press.]. According to this conceptual metaphor theory, image schemas structure and provide sensory-motor grounding for abstract concepts. Although there is evidence that people automatically activate image schemas when they process abstract concepts, we argue that situations are also needed to fully represent meaning

    What working memory is for

    Get PDF
    Glenberg focuses on conceptualizations that change from moment to moment, yet he dismisses the concept of working memory (sect. 4.3), which offers an account of temporary storage and on-line cognition. This commentary questions whether Glenberg's account adequately caters for observations of consistent data patterns in temporary storage of verbal and visuospatial information in healthy adults and in brain-damaged patients with deficits in temporary retention.</jats:p

    Neurally Implementable Semantic Networks

    Full text link
    We propose general principles for semantic networks allowing them to be implemented as dynamical neural networks. Major features of our scheme include: (a) the interpretation that each node in a network stands for a bound integration of the meanings of all nodes and external events the node links with; (b) the systematic use of nodes that stand for categories or types, with separate nodes for instances of these types; (c) an implementation of relationships that does not use intrinsically typed links between nodes.Comment: 32 pages, 12 figure

    Similarity learning for person re-identification and semantic video retrieval

    Full text link
    Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval. Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art. Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos. Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00
    • …
    corecore