23,538 research outputs found
Semantic Image Retrieval via Active Grounding of Visual Situations
We describe a novel architecture for semantic image retrieval---in
particular, retrieval of instances of visual situations. Visual situations are
concepts such as "a boxing match," "walking the dog," "a crowd waiting for a
bus," or "a game of ping-pong," whose instantiations in images are linked more
by their common spatial and semantic structure than by low-level visual
similarity. Given a query situation description, our architecture---called
Situate---learns models capturing the visual features of expected objects as
well the expected spatial configuration of relationships among objects. Given a
new image, Situate uses these models in an attempt to ground (i.e., to create a
bounding box locating) each expected component of the situation in the image
via an active search procedure. Situate uses the resulting grounding to compute
a score indicating the degree to which the new image is judged to contain an
instance of the situation. Such scores can be used to rank images in a
collection as part of a retrieval system. In the preliminary study described
here, we demonstrate the promise of this system by comparing Situate's
performance with that of two baseline methods, as well as with a related
semantic image-retrieval system based on "scene graphs.
Multimodal Grounding for Language Processing
This survey discusses how recent developments in multimodal processing
facilitate conceptual grounding of language. We categorize the information flow
in multimodal processing with respect to cognitive models of human information
processing and analyze different methods for combining multimodal
representations. Based on this methodological inventory, we discuss the benefit
of multimodal grounding for a variety of language processing tasks and the
challenges that arise. We particularly focus on multimodal grounding of verbs
which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference
of Computational Linguistics. Please refer to this version for citations:
https://www.aclweb.org/anthology/papers/C/C18/C18-1197
Grounding semantic web services with rules
Semantic web services achieve effects in the world through web services, so the connection to those services - the grounding - is of paramount importance. The established technique is to use XML-based translations between ontologies and the SOAP message formats of the services, but these mappings cannot address the growing number of non-SOAP services, and step outside the ontological world to describe the mapping. We present an approach which draws the service's interface into the ontology: we define ontology objects which represent the whole HTTP message, and use backward-chaining rules to translate between semantic service invocation instances and the HTTP messages passed to and from the service. We present a case study using Amazon's popular Simple Storage Service
Action semantics at the bottom of the brain: Insights from dysplastic cerebellar gangliocytoma
Recent embodied cognition research shows that access to action verbs in shallow-processing tasks becomes selectively compromised upon atrophy of the cerebellum, a critical motor region. Here we assessed whether cerebellar damage also disturbs explicit semantic processing of action pictures and its integration with ongoing motor responses. We evaluated a cognitively preserved 33-year-old man with severe dysplastic cerebellar gangliocytoma (Lhermitte-Duclos disease), encompassing most of the right cerebellum and the posterior part of the left cerebellum. The patient and eight healthy controls completed two semantic association tasks (involving pictures of objects and actions, respectively) that required motor responses. Accuracy results via Crawford's modified t-tests revealed that the patient was selectively impaired in action association. Moreover, reaction-time analysis through Crawford's Revised Standardized Difference Test showed that, while processing of action concepts involved slower manual responses in controls, no such effect was observed in the patient, suggesting that motor-semantic integration dynamics may be compromised following cerebellar damage. Notably, a Bayesian Test for a Deficit allowing for Covariates revealed that these patterns remained after covarying for executive performance, indicating that they were not secondary to extra-linguistic impairments. Taken together, our results extend incipient findings on the embodied functions of the cerebellum, offering unprecedented evidence of its crucial role in processing non-verbal action meanings and integrating them with concomitant movements. These findings illuminate the relatively unexplored semantic functions of this region while calling for extensions of motor cognition models.Fil: Cervetto Manciameli, Sabrina Fabiana. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentina. Universidad de la República; UruguayFil: Abrevaya, Sofia. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Martorell Caro, Miguel Angel. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Kozono, Giselle. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Muñoz, Edinson. Universidad de Santiago de Chile; ChileFil: Ferrari, Jesica. Instituto de NeurologÃa Cognitiva; ArgentinaFil: Sedeño, Lucas. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; ArgentinaFil: Ibáñez Barassi, AgustÃn Mariano. Australian Research Council; Australia. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentina. Universidad Autónoma del Caribe; Colombia. Universidad Adolfo Ibañez; ChileFil: GarcÃa, Adolfo MartÃn. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Neurociencia Cognitiva. Fundación Favaloro. Instituto de Neurociencia Cognitiva; Argentina. Universidad Nacional de Cuyo; Argentin
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Question categorization and expert retrieval methods have been crucial for
information organization and accessibility in community question & answering
(CQA) platforms. Research in this area, however, has dealt with only the text
modality. With the increasing multimodal nature of web content, we focus on
extending these methods for CQA questions accompanied by images. Specifically,
we leverage the success of representation learning for text and images in the
visual question answering (VQA) domain, and adapt the underlying concept and
architecture for automated category classification and expert retrieval on
image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of
Yahoo! Answers.
To the best of our knowledge, this is the first work to tackle the
multimodality challenge in CQA, and to adapt VQA models for tasks on a more
ecologically valid source of visual questions. Our analysis of the differences
between visual QA and community QA data drives our proposal of novel
augmentations of an attention method tailored for CQA, and use of auxiliary
tasks for learning better grounding features. Our final model markedly
outperforms the text-only and VQA model baselines for both tasks of
classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201
On staying grounded and avoiding Quixotic dead ends
The 15 articles in this special issue on The Representation of Concepts illustrate the rich variety of theoretical positions and supporting research that characterize the area. Although much agreement exists among contributors, much disagreement exists as well, especially about the roles of grounding and abstraction in conceptual processing. I first review theoretical approaches raised in these articles that I believe are Quixotic dead ends, namely, approaches that are principled and inspired but likely to fail. In the process, I review various theories of amodal symbols, their distortions of grounded theories, and fallacies in the evidence used to support them. Incorporating further contributions across articles, I then sketch a theoretical approach that I believe is likely to be successful, which includes grounding, abstraction, flexibility, explaining classic conceptual phenomena, and making contact with real-world situations. This account further proposes that (1) a key element of grounding is neural reuse, (2) abstraction takes the forms of multimodal compression, distilled abstraction, and distributed linguistic representation (but not amodal symbols), and (3) flexible context-dependent representations are a hallmark of conceptual processing
- …