23,538 research outputs found

    Semantic Image Retrieval via Active Grounding of Visual Situations

    Full text link
    We describe a novel architecture for semantic image retrieval---in particular, retrieval of instances of visual situations. Visual situations are concepts such as "a boxing match," "walking the dog," "a crowd waiting for a bus," or "a game of ping-pong," whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similarity. Given a query situation description, our architecture---called Situate---learns models capturing the visual features of expected objects as well the expected spatial configuration of relationships among objects. Given a new image, Situate uses these models in an attempt to ground (i.e., to create a bounding box locating) each expected component of the situation in the image via an active search procedure. Situate uses the resulting grounding to compute a score indicating the degree to which the new image is judged to contain an instance of the situation. Such scores can be used to rank images in a collection as part of a retrieval system. In the preliminary study described here, we demonstrate the promise of this system by comparing Situate's performance with that of two baseline methods, as well as with a related semantic image-retrieval system based on "scene graphs.

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Grounding semantic web services with rules

    Get PDF
    Semantic web services achieve effects in the world through web services, so the connection to those services - the grounding - is of paramount importance. The established technique is to use XML-based translations between ontologies and the SOAP message formats of the services, but these mappings cannot address the growing number of non-SOAP services, and step outside the ontological world to describe the mapping. We present an approach which draws the service's interface into the ontology: we define ontology objects which represent the whole HTTP message, and use backward-chaining rules to translate between semantic service invocation instances and the HTTP messages passed to and from the service. We present a case study using Amazon's popular Simple Storage Service

    Action semantics at the bottom of the brain: Insights from dysplastic cerebellar gangliocytoma

    Get PDF
    Recent embodied cognition research shows that access to action verbs in shallow-processing tasks becomes selectively compromised upon atrophy of the cerebellum, a critical motor region. Here we assessed whether cerebellar damage also disturbs explicit semantic processing of action pictures and its integration with ongoing motor responses. We evaluated a cognitively preserved 33-year-old man with severe dysplastic cerebellar gangliocytoma (Lhermitte-Duclos disease), encompassing most of the right cerebellum and the posterior part of the left cerebellum. The patient and eight healthy controls completed two semantic association tasks (involving pictures of objects and actions, respectively) that required motor responses. Accuracy results via Crawford's modified t-tests revealed that the patient was selectively impaired in action association. Moreover, reaction-time analysis through Crawford's Revised Standardized Difference Test showed that, while processing of action concepts involved slower manual responses in controls, no such effect was observed in the patient, suggesting that motor-semantic integration dynamics may be compromised following cerebellar damage. Notably, a Bayesian Test for a Deficit allowing for Covariates revealed that these patterns remained after covarying for executive performance, indicating that they were not secondary to extra-linguistic impairments. Taken together, our results extend incipient findings on the embodied functions of the cerebellum, offering unprecedented evidence of its crucial role in processing non-verbal action meanings and integrating them with concomitant movements. These findings illuminate the relatively unexplored semantic functions of this region while calling for extensions of motor cognition models.Fil: Cervetto Manciameli, Sabrina Fabiana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentina. Universidad de la República; UruguayFil: Abrevaya, Sofia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Martorell Caro, Miguel Angel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Kozono, Giselle. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Muñoz, Edinson. Universidad de Santiago de Chile; ChileFil: Ferrari, Jesica. Instituto de Neurología Cognitiva; ArgentinaFil: Sedeño, Lucas. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Ibáñez Barassi, Agustín Mariano. Australian Research Council; Australia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentina. Universidad Autónoma del Caribe; Colombia. Universidad Adolfo Ibañez; ChileFil: García, Adolfo Martín. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentina. Universidad Nacional de Cuyo; Argentin

    Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms

    Full text link
    Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasing multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain, and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA, and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201

    On staying grounded and avoiding Quixotic dead ends

    Get PDF
    The 15 articles in this special issue on The Representation of Concepts illustrate the rich variety of theoretical positions and supporting research that characterize the area. Although much agreement exists among contributors, much disagreement exists as well, especially about the roles of grounding and abstraction in conceptual processing. I first review theoretical approaches raised in these articles that I believe are Quixotic dead ends, namely, approaches that are principled and inspired but likely to fail. In the process, I review various theories of amodal symbols, their distortions of grounded theories, and fallacies in the evidence used to support them. Incorporating further contributions across articles, I then sketch a theoretical approach that I believe is likely to be successful, which includes grounding, abstraction, flexibility, explaining classic conceptual phenomena, and making contact with real-world situations. This account further proposes that (1) a key element of grounding is neural reuse, (2) abstraction takes the forms of multimodal compression, distilled abstraction, and distributed linguistic representation (but not amodal symbols), and (3) flexible context-dependent representations are a hallmark of conceptual processing
    • …
    corecore