120 research outputs found
Neural Sequence-to-grid Module for Learning Symbolic Rules
Logical reasoning tasks over symbols, such as learning arithmetic operations
and computer program evaluations, have become challenges to deep learning. In
particular, even state-of-the-art neural networks fail to achieve
\textit{out-of-distribution} (OOD) generalization of symbolic reasoning tasks,
whereas humans can easily extend learned symbolic rules. To resolve this
difficulty, we propose a neural sequence-to-grid (seq2grid) module, an input
preprocessor that automatically segments and aligns an input sequence into a
grid. As our module outputs a grid via a novel differentiable mapping, any
neural network structure taking a grid input, such as ResNet or TextCNN, can be
jointly trained with our module in an end-to-end fashion. Extensive experiments
show that neural networks having our module as an input preprocessor achieve
OOD generalization on various arithmetic and algorithmic problems including
number sequence prediction problems, algebraic word problems, and computer
program evaluation problems while other state-of-the-art sequence transduction
models cannot. Moreover, we verify that our module enhances TextCNN to solve
the bAbI QA tasks without external memory.Comment: 9 pages, 9 figures, AAAI 202
A survey of cross-lingual word embedding models
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.</jats:p
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings
Transformers generalize to novel compositions of structures and entities
after being trained on a complex dataset, but easily overfit on datasets of
insufficient complexity. We observe that when the training set is sufficiently
complex, the model encodes sentences that have a common syntactic structure
using a systematic attention pattern. Inspired by this observation, we propose
SQ-Transformer (Structurally Quantized) that explicitly encourages
systematicity in the embeddings and attention layers, even with a training set
of low complexity. At the embedding level, we introduce Structure-oriented
Vector Quantization (SoVQ) to cluster word embeddings into several classes of
structurally equivalent entities. At the attention level, we devise the
Systematic Attention Layer (SAL) and an alternative, Systematically Regularized
Layer (SRL) that operate on the quantized word embeddings so that sentences of
the same structure are encoded with invariant or similar attention patterns.
Empirically, we show that SQ-Transformer achieves stronger compositional
generalization than the vanilla Transformer on multiple low-complexity semantic
parsing and machine translation datasets. In our analysis, we show that SoVQ
indeed learns a syntactically clustered embedding space and SAL/SRL induces
generalizable attention patterns, which lead to improved systematicity.Comment: 22 pages, code: https://github.com/jiangycTarheel/SQ-Transforme
A Short Survey of Systematic Generalization
This survey includes systematic generalization and a history of how machine
learning addresses it. We aim to summarize and organize the related information
of both conventional and recent improvements. We first look at the definition
of systematic generalization, then introduce Classicist and Connectionist. We
then discuss different types of Connectionists and how they approach the
generalization. Two crucial problems of variable binding and causality are
discussed. We look into systematic generalization in language, vision, and VQA
fields. Recent improvements from different aspects are discussed. Systematic
generalization has a long history in artificial intelligence. We could cover
only a small portion of many contributions. We hope this paper provides a
background and is beneficial for discoveries in future work
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
Weakly Supervised Visual Semantic Parsing
Scene Graph Generation (SGG) aims to extract entities, predicates and their
semantic structure from images, enabling deep understanding of visual content,
with many applications such as visual reasoning and image retrieval.
Nevertheless, existing SGG methods require millions of manually annotated
bounding boxes for training, and are computationally inefficient, as they
exhaustively process all pairs of object proposals to detect predicates. In
this paper, we address those two limitations by first proposing a generalized
formulation of SGG, namely Visual Semantic Parsing, which disentangles entity
and predicate recognition, and enables sub-quadratic performance. Then we
propose the Visual Semantic Parsing Network, VSPNet, based on a dynamic,
attention-based, bipartite message passing framework that jointly infers graph
nodes and edges through an iterative process. Additionally, we propose the
first graph-based weakly supervised learning framework, based on a novel graph
alignment algorithm, which enables training without bounding box annotations.
Through extensive experiments, we show that VSPNet outperforms weakly
supervised baselines significantly and approaches fully supervised performance,
while being several times faster. We publicly release the source code of our
method.Comment: To be presented at CVPR 2020 (oral paper
- …