1,100 research outputs found
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
A Short Survey of Systematic Generalization
This survey includes systematic generalization and a history of how machine
learning addresses it. We aim to summarize and organize the related information
of both conventional and recent improvements. We first look at the definition
of systematic generalization, then introduce Classicist and Connectionist. We
then discuss different types of Connectionists and how they approach the
generalization. Two crucial problems of variable binding and causality are
discussed. We look into systematic generalization in language, vision, and VQA
fields. Recent improvements from different aspects are discussed. Systematic
generalization has a long history in artificial intelligence. We could cover
only a small portion of many contributions. We hope this paper provides a
background and is beneficial for discoveries in future work
Visual Concept-Metaconcept Learning
Humans reason with concepts and metaconcepts: we recognize red and green from
visual input; we also understand that they describe the same property of
objects (i.e., the color). In this paper, we propose the visual
concept-metaconcept learner (VCML) for joint learning of concepts and
metaconcepts from images and associated question-answer pairs. The key is to
exploit the bidirectional connection between visual concepts and metaconcepts.
Visual representations provide grounding cues for predicting relations between
unseen pairs of concepts. Knowing that red and green describe the same property
of objects, we generalize to the fact that cube and sphere also describe the
same property of objects, since they both categorize the shape of objects.
Meanwhile, knowledge about metaconcepts empowers visual concept learning from
limited, noisy, and even biased data. From just a few examples of purple cubes
we can understand a new color purple, which resembles the hue of the cubes
instead of the shape of them. Evaluation on both synthetic and real-world
datasets validates our claims.Comment: NeurIPS 2019. First two authors contributed equally. Project page:
http://vcml.csail.mit.edu
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Towards Equipping Transformer with the Ability of Systematic Compositionality
One of the key factors in language productivity and human cognition is the
ability of systematic compositionality, which refers to understanding composed
unseen examples of seen primitives. However, recent evidence reveals that the
Transformers have difficulty generalizing the composed context based on the
seen primitives. To this end, we take the first step to propose a
compositionality-aware Transformer called CAT and two novel pre-training tasks
to facilitate systematic compositionality. We tentatively provide a successful
implementation of a multi-layer CAT on the basis of the especially popular
BERT. The experimental results demonstrate that CAT outperforms baselines on
compositionality-aware tasks with minimal impact on the effectiveness on
standardized language understanding tasks.Comment: Accepted to AAAI 2024. Paper with appendi
Neural-Symbolic Recursive Machine for Systematic Generalization
Despite the tremendous success, existing machine learning models still fall
short of human-like systematic generalization -- learning compositional rules
from limited data and applying them to unseen combinations in various domains.
We propose Neural-Symbolic Recursive Machine (NSR) to tackle this deficiency.
The core representation of NSR is a Grounded Symbol System (GSS) with
combinatorial syntax and semantics, which entirely emerges from training data.
Akin to the neuroscience studies suggesting separate brain systems for
perceptual, syntactic, and semantic processing, NSR implements analogous
separate modules of neural perception, syntactic parsing, and semantic
reasoning, which are jointly learned by a deduction-abduction algorithm. We
prove that NSR is expressive enough to model various sequence-to-sequence
tasks. Superior systematic generalization is achieved via the inductive biases
of equivariance and recursiveness embedded in NSR. In experiments, NSR achieves
state-of-the-art performance in three benchmarks from different domains: SCAN
for semantic parsing, PCFG for string manipulation, and HINT for arithmetic
reasoning. Specifically, NSR achieves 100% generalization accuracy on SCAN and
PCFG and outperforms state-of-the-art models on HINT by about 23%. Our NSR
demonstrates stronger generalization than pure neural networks due to its
symbolic representation and inductive biases. NSR also demonstrates better
transferability than existing neural-symbolic approaches due to less
domain-specific knowledge required
Visually Grounded Language Learning: a review of language games, datasets, tasks, and models
In recent years, several machine learning models have been proposed. They are
trained with a language modelling objective on large-scale text-only data. With
such pretraining, they can achieve impressive results on many Natural Language
Understanding and Generation tasks. However, many facets of meaning cannot be
learned by ``listening to the radio" only. In the literature, many
Vision+Language (V+L) tasks have been defined with the aim of creating models
that can ground symbols in the visual modality. In this work, we provide a
systematic literature review of several tasks and models proposed in the V+L
field. We rely on Wittgenstein's idea of `language games' to categorise such
tasks into 3 different families: 1) discriminative games, 2) generative games,
and 3) interactive games. Our analysis of the literature provides evidence that
future work should be focusing on interactive games where communication in
Natural Language is important to resolve ambiguities about object referents and
action plans and that physical embodiment is essential to understand the
semantics of situations and events. Overall, these represent key requirements
for developing grounded meanings in neural models.Comment: Preprint for JAIR before copyeditin
- …