9 research outputs found
Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning
Spatial reasoning over text is challenging as the models not only need to
extract the direct spatial information from the text but also reason over those
and infer implicit spatial relations. Recent studies highlight the struggles
even large language models encounter when it comes to performing spatial
reasoning over text. In this paper, we explore the potential benefits of
disentangling the processes of information extraction and reasoning in models
to address this challenge. To explore this, we design various models that
disentangle extraction and reasoning(either symbolic or neural) and compare
them with state-of-the-art(SOTA) baselines with no explicit design for these
parts. Our experimental results consistently demonstrate the efficacy of
disentangling, showcasing its ability to enhance models' generalizability
within realistic data domains.Comment: Accepted in EMNLP-Finding 202
Graph Neural Networks Extract High-Resolution Cultivated Land Maps from Sentinel-2 Image Series
Maintaining farm sustainability through optimizing the agricultural
management practices helps build more planet-friendly environment. The emerging
satellite missions can acquire multi- and hyperspectral imagery which captures
more detailed spectral information concerning the scanned area, hence allows us
to benefit from subtle spectral features during the analysis process in
agricultural applications. We introduce an approach for extracting 2.5 m
cultivated land maps from 10 m Sentinel-2 multispectral image series which
benefits from a compact graph convolutional neural network. The experiments
indicate that our models not only outperform classical and deep machine
learning techniques through delivering higher-quality segmentation maps, but
also dramatically reduce the memory footprint when compared to U-Nets (almost
8k trainable parameters of our models, with up to 31M parameters of U-Nets).
Such memory frugality is pivotal in the missions which allow us to uplink a
model to the AI-powered satellite once it is in orbit, as sending large nets is
impossible due to the time constraints.Comment: 7 pages (including supplementary material), published in IEEE
Geoscience and Remote Sensing Letter
Bird-Eye Transformers for Text Generation Models
Transformers have become an indispensable module for text generation models
since their great success in machine translation. Previous works attribute
the~success of transformers to the query-key-value dot-product attention, which
provides a robust inductive bias by the fully connected token graphs. However,
we found that self-attention has a severe limitation. When predicting the
(i+1)-th token, self-attention only takes the i-th token as an information
collector, and it tends to give a high attention weight to those tokens similar
to itself. Therefore, most of the historical information that occurred before
the i-th token is not taken into consideration. Based on this observation, in
this paper, we propose a new architecture, called bird-eye transformer(BET),
which goes one step further to improve the performance of transformers by
reweighting self-attention to encourage it to focus more on important
historical information. We have conducted experiments on multiple text
generation tasks, including machine translation (2 datasets) and language
models (3 datasets). These experimental~results show that our proposed model
achieves a better performance than the baseline transformer architectures
on~all~datasets. The code is released at:
\url{https://sites.google.com/view/bet-transformer/home}
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text
Spatial reasoning in text plays a crucial role in various real-world
applications. Existing approaches for spatial reasoning typically infer spatial
relations from pure text, which overlook the gap between natural language and
symbolic structures. Graph neural networks (GNNs) have showcased exceptional
proficiency in inducing and aggregating symbolic structures. However, classical
GNNs face challenges in handling multi-hop spatial reasoning due to the
over-smoothing issue, \textit{i.e.}, the performance decreases substantially as
the number of graph layers increases. To cope with these challenges, we propose
a novel \textbf{Dep}th-\textbf{Wi}se \textbf{G}raph \textbf{N}eural
\textbf{N}etwork (\textbf{DepWiGNN}). Specifically, we design a novel node
memory scheme and aggregate the information over the depth dimension instead of
the breadth dimension of the graph, which empowers the ability to collect long
dependencies without stacking multiple layers. Experimental results on two
challenging multi-hop spatial reasoning datasets show that DepWiGNN outperforms
existing spatial reasoning methods. The comparisons with the other three GNNs
further demonstrate its superiority in capturing long dependency in the graph.Comment: EMNLP 2023 Finding
Memory-Constrained Policy Optimization
We introduce a new constrained optimization method for policy gradient
reinforcement learning, which uses two trust regions to regulate each policy
update. In addition to using the proximity of one single old policy as the
first trust region as done by prior works, we propose to form a second trust
region through the construction of another virtual policy that represents a
wide range of past policies. We then enforce the new policy to stay closer to
the virtual policy, which is beneficial in case the old policy performs badly.
More importantly, we propose a mechanism to automatically build the virtual
policy from a memory buffer of past policies, providing a new capability for
dynamically selecting appropriate trust regions during the optimization
process. Our proposed method, dubbed as Memory-Constrained Policy Optimization
(MCPO), is examined on a diverse suite of environments including robotic
locomotion control, navigation with sparse rewards and Atari games,
consistently demonstrating competitive performance against recent on-policy
constrained policy gradient methods.Comment: Preprint, 24 page