4,296 research outputs found
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Following the major success of neural language models (LMs) such as BERT or
GPT-2 on a variety of language understanding tasks, recent work focused on
injecting (structured) knowledge from external resources into these models.
While on the one hand, joint pretraining (i.e., training from scratch, adding
objectives based on external knowledge to the primary LM objective) may be
prohibitively computationally expensive, post-hoc fine-tuning on external
knowledge, on the other hand, may lead to the catastrophic forgetting of
distributional knowledge. In this work, we investigate models for complementing
the distributional knowledge of BERT with conceptual knowledge from ConceptNet
and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using
adapter training. While overall results on the GLUE benchmark paint an
inconclusive picture, a deeper analysis reveals that our adapter-based models
substantially outperform BERT (up to 15-20 performance points) on inference
tasks that require the type of conceptual knowledge explicitly present in
ConceptNet and OMCS
Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs
Recent graph-to-text models generate text from graph-based data using either
global or local aggregation to learn node representations. Global node encoding
allows explicit communication between two distant nodes, thereby neglecting
graph topology as all nodes are directly connected. In contrast, local node
encoding considers the relations between neighbor nodes capturing the graph
structure, but it can fail to capture long-range relations. In this work, we
gather both encoding strategies, proposing novel neural models which encode an
input graph combining both global and local node contexts, in order to learn
better contextualized node embeddings. In our experiments, we demonstrate that
our approaches lead to significant improvements on two graph-to-text datasets
achieving BLEU scores of 18.01 on AGENDA dataset, and 63.69 on the WebNLG
dataset for seen categories, outperforming state-of-the-art models by 3.7 and
3.1 points, respectively.Comment: Accepted for publication in Transactions of the Association for
Computational Linguistics (TACL), 2020; Author's final version; pre-MIT Press
publication versio
Investigating Pretrained Language Models for Graph-to-Text Generation
Graph-to-text generation aims to generate fluent texts from graph-based data.
In this paper, we investigate two recently proposed pretrained language models
(PLMs) and analyze the impact of different task-adaptive pretraining strategies
for PLMs in graph-to-text generation. We present a study across three graph
domains: meaning representations, Wikipedia knowledge graphs (KGs) and
scientific KGs. We show that the PLMs BART and T5 achieve new state-of-the-art
results and that task-adaptive pretraining strategies improve their performance
even further. In particular, we report new state-of-the-art BLEU scores of
49.72 on LDC2017T10, 59.70 on WebNLG, and 25.66 on AGENDA datasets - a relative
improvement of 31.8%, 4.5%, and 42.4%, respectively. In an extensive analysis,
we identify possible reasons for the PLMs' success on graph-to-text tasks. We
find evidence that their knowledge about true facts helps them perform well
even when the input graph representation is reduced to a simple bag of node and
edge labels.Comment: Our code and pretrained model checkpoints are available at
https://github.com/UKPLab/plms-graph2tex
Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs
We present Graformer, a novel Transformer-based encoder-decoder architecture
for graph-to-text generation. With our novel graph self-attention, the encoding
of a node relies on all nodes in the input graph - not only direct neighbors -
facilitating the detection of global patterns. We represent the relation
between two nodes as the length of the shortest path between them. Graformer
learns to weight these node-node relations differently for different attention
heads, thus virtually learning differently connected views of the input graph.
We evaluate Graformer on two popular graph-to-text generation benchmarks,
AGENDA and WebNLG, where it achieves strong performance while using many fewer
parameters than other approaches
Saddles in the energy landscape probed by supercooled liquids
We numerically investigate the supercooled dynamics of two simple model
liquids exploiting the partition of the multi-dimension configuration space in
basins of attraction of the stationary points (inherent saddles) of the
potential energy surface. We find that the inherent saddles order and potential
energy are well defined functions of the temperature T. Moreover, decreasing T,
the saddle order vanishes at the same temperature (T_MCT) where the inverse
diffusivity appears to diverge as a power law. This allows a topological
interpretation of T_MCT: it marks the transition from a dynamics between basins
of saddles (T>T_MCT) to a dynamics between basins of minima (T<T_MCT).Comment: 4 pages, 3 figures, to be published on PR
Relaxation processes in harmonic glasses?
A relaxation process, with the associated phenomenology of sound attenuation
and sound velocity dispersion, is found in a simulated harmonic Lennard-Jones
glass. We propose to identify this process with the so called microscopic (or
instantaneous) relaxation process observed in real glasses and supercooled
liquids. A model based on the memory function approach accounts for the
observation, and allows to relate to each others: 1) the characteristic time
and strength of this process, 2) the low frequency limit of the dynamic
structure factor of the glass, and 3) the high frequency sound attenuation
coefficient, with its observed quadratic dependence on the momentum transfer.Comment: 11 pages, 3 figure
Elastic constant dishomogeneity and dependence of the broadening of the dynamical structure factor in disordered systems
We propose an explanation for the quadratic dependence on the momentum ,
of the broadening of the acoustic excitation peak recently found in the study
of the dynamic structure factor of many real and simulated glasses. We ascribe
the observed law to the spatial fluctuations of the local wavelength of
the collective vibrational modes, in turn produced by the dishomegeneity of the
inter-particle elastic constants. This explanation is analitically shown to
hold for 1-dimensional disordered chains and satisfatorily numerically tested
in both 1 and 3 dimensions.Comment: 4 pages, RevTeX, 5 postscript figure
Moisture Control, Inoculant and Particle Size in Tropical Grass Silages
Decreased fermentation and spoilage losses with improved aerobic stability during feed out can be accomplished by several strategies, such as wilting, addition of microbial additives and moisture absorbents. Particle size reduction may increase bulk density and improve the fermentation. The objective of this trial was to evaluate the effects of particle size, moisture content and a microbial additive on chemical-physical parameters and losses in silages made from Tanzania grass
Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning
Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans, thereby learning to reason over long-horizon tasks, as encountered in the ALFRED benchmark. We compare our approach with classical planning and baseline methods to examine the applicability and generalizability of LLM-based planners. Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics
- …