1,633 research outputs found
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
A Survey on Interpretable Cross-modal Reasoning
In recent years, cross-modal reasoning (CMR), the process of understanding
and reasoning across different modalities, has emerged as a pivotal area with
applications spanning from multimedia analysis to healthcare diagnostics. As
the deployment of AI systems becomes more ubiquitous, the demand for
transparency and comprehensibility in these systems' decision-making processes
has intensified. This survey delves into the realm of interpretable cross-modal
reasoning (I-CMR), where the objective is not only to achieve high predictive
performance but also to provide human-understandable explanations for the
results. This survey presents a comprehensive overview of the typical methods
with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the
existing CMR datasets with annotations for explanations. Finally, this survey
summarizes the challenges for I-CMR and discusses potential future directions.
In conclusion, this survey aims to catalyze the progress of this emerging
research area by providing researchers with a panoramic and comprehensive
perspective, illuminating the state of the art and discerning the
opportunities
- …