204 research outputs found
Equivariant Transduction through Invariant Alignment
The ability to generalize compositionally is key to understanding the
potentially infinite number of sentences that can be constructed in a human
language from only a finite number of words. Investigating whether NLP models
possess this ability has been a topic of interest: SCAN (Lake and Baroni, 2018)
is one task specifically proposed to test for this property. Previous work has
achieved impressive empirical results using a group-equivariant neural network
that naturally encodes a useful inductive bias for SCAN (Gordon et al., 2020).
Inspired by this, we introduce a novel group-equivariant architecture that
incorporates a group-invariant hard alignment mechanism. We find that our
network's structure allows it to develop stronger equivariance properties than
existing group-equivariant approaches. We additionally find that it outperforms
previous group-equivariant networks empirically on the SCAN task. Our results
suggest that integrating group-equivariance into a variety of neural
architectures is a potentially fruitful avenue of research, and demonstrate the
value of careful analysis of the theoretical properties of such architectures.Comment: Accepted at COLING 202
Neural-Symbolic Recursive Machine for Systematic Generalization
Despite the tremendous success, existing machine learning models still fall
short of human-like systematic generalization -- learning compositional rules
from limited data and applying them to unseen combinations in various domains.
We propose Neural-Symbolic Recursive Machine (NSR) to tackle this deficiency.
The core representation of NSR is a Grounded Symbol System (GSS) with
combinatorial syntax and semantics, which entirely emerges from training data.
Akin to the neuroscience studies suggesting separate brain systems for
perceptual, syntactic, and semantic processing, NSR implements analogous
separate modules of neural perception, syntactic parsing, and semantic
reasoning, which are jointly learned by a deduction-abduction algorithm. We
prove that NSR is expressive enough to model various sequence-to-sequence
tasks. Superior systematic generalization is achieved via the inductive biases
of equivariance and recursiveness embedded in NSR. In experiments, NSR achieves
state-of-the-art performance in three benchmarks from different domains: SCAN
for semantic parsing, PCFG for string manipulation, and HINT for arithmetic
reasoning. Specifically, NSR achieves 100% generalization accuracy on SCAN and
PCFG and outperforms state-of-the-art models on HINT by about 23%. Our NSR
demonstrates stronger generalization than pure neural networks due to its
symbolic representation and inductive biases. NSR also demonstrates better
transferability than existing neural-symbolic approaches due to less
domain-specific knowledge required
Neurosymbolic Grounding for Compositional World Models
We introduce Cosmos, a framework for object-centric world modeling that is
designed for compositional generalization (CG), i.e., high performance on
unseen input scenes obtained through the composition of known visual "atoms."
The central insight behind Cosmos is the use of a novel form of neurosymbolic
grounding. Specifically, the framework introduces two new tools: (i)
neurosymbolic scene encodings, which represent each entity in a scene using a
real vector computed using a neural encoder, as well as a vector of composable
symbols describing attributes of the entity, and (ii) a neurosymbolic attention
mechanism that binds these entities to learned rules of interaction. Cosmos is
end-to-end differentiable; also, unlike traditional neurosymbolic methods that
require representations to be manually mapped to symbols, it computes an
entity's symbolic attributes using vision-language foundation models. Through
an evaluation that considers two different forms of CG on an established
blocks-pushing domain, we show that the framework establishes a new
state-of-the-art for CG in world modeling
Symmetry-Preserving Program Representations for Learning Code Semantics
Large Language Models (LLMs) have shown promise in automated program
reasoning, a crucial aspect of many security tasks. However, existing LLM
architectures for code are often borrowed from other domains like natural
language processing, raising concerns about their generalization and robustness
to unseen code. A key generalization challenge is to incorporate the knowledge
of code semantics, including control and data flow, into the LLM architectures.
Drawing inspiration from examples of convolution layers exploiting
translation symmetry, we explore how code symmetries can enhance LLM
architectures for program analysis and modeling. We present a rigorous
group-theoretic framework that formally defines code symmetries as
semantics-preserving transformations and provides techniques for precisely
reasoning about symmetry preservation within LLM architectures. Using this
framework, we introduce a novel variant of self-attention that preserves
program symmetries, demonstrating its effectiveness in generalization and
robustness through detailed experimental evaluations across different binary
and source code analysis tasks. Overall, our code symmetry framework offers
rigorous and powerful reasoning techniques that can guide the future
development of specialized LLMs for code and advance LLM-guided program
reasoning tasks
- …