31 research outputs found
Recommended from our members
Pragmatically Informative Color Generation by Grounding Contextual Modifiers
Grounding language in contextual information is crucial for fine-grained natural language understanding. One important task that involves grounding contextual modifiers is color generation. Given a reference color green, and a modifier bluey, how does one generate a color that could represent bluey green? We propose a computational pragmatics model that formulates this color generation task as a recursive game between speakers and listeners. In our model, a pragmatic speaker reasons about the inferences that a listener would make, and thus generates a modified color that is maximally informative to help the listener recover the original referents. In this paper, we show that incorporating pragmatic information provides significant improvements in performance compared with other state-of-the-art deep learning models where pragmatic inference and flexibility in representing colors from a large continuous space are lacking
Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis
Aspect-based sentiment analysis (ABSA) and Targeted ASBA (TABSA) allow
finer-grained inferences about sentiment to be drawn from the same text,
depending on context. For example, a given text can have different targets
(e.g., neighborhoods) and different aspects (e.g., price or safety), with
different sentiment associated with each target-aspect pair. In this paper, we
investigate whether adding context to self-attention models improves
performance on (T)ABSA. We propose two variants of Context-Guided BERT
(CG-BERT) that learn to distribute attention under different contexts. We first
adapt a context-aware Transformer to produce a CG-BERT that uses context-guided
softmax-attention. Next, we propose an improved Quasi-Attention CG-BERT model
that learns a compositional attention that supports subtractive attention. We
train both models with pretrained BERT on two (T)ABSA datasets: SentiHood and
SemEval-2014 (Task 4). Both models achieve new state-of-the-art results with
our QACG-BERT model having the best performance. Furthermore, we provide
analyses of the impact of context in the our proposed models. Our work provides
more evidence for the utility of adding context-dependencies to pretrained
self-attention-based language models for context-based natural language tasks
ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Compositional generalization benchmarks seek to assess whether models can
accurately compute meanings for novel sentences, but operationalize this in
terms of logical form (LF) prediction. This raises the concern that
semantically irrelevant details of the chosen LFs could shape model
performance. We argue that this concern is realized for the COGS benchmark (Kim
and Linzen, 2020). COGS poses generalization splits that appear impossible for
present-day models, which could be taken as an indictment of those models.
However, we show that the negative results trace to incidental features of COGS
LFs. Converting these LFs to semantically equivalent ones and factoring out
capabilities unrelated to semantic interpretation, we find that even baseline
models get traction. A recent variable-free translation of COGS LFs suggests
similar conclusions, but we observe this format is not semantically equivalent;
it is incapable of accurately representing some COGS meanings. These findings
inform our proposal for ReCOGS, a modified version of COGS that comes closer to
assessing the target semantic capabilities while remaining very challenging.
Overall, our results reaffirm the importance of compositional generalization
and careful benchmark task design.Comment: 10 pages, 5 figure
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
The information stored in large language models (LLMs) falls out of date
quickly, and retraining from scratch is often not an option. This has recently
given rise to a range of techniques for injecting new facts through updating
model weights. Current evaluation paradigms are extremely limited, mainly
validating the recall of edited facts, but changing one fact should cause
rippling changes to the model's related beliefs. If we edit the UK Prime
Minister to now be Rishi Sunak, then we should get a different answer to Who is
married to the British Prime Minister? In this work, we present a benchmark
MQuAKE (Multi-hop Question Answering for Knowledge Editing) comprising
multi-hop questions that assess whether edited models correctly answer
questions where the answer should change as an entailed consequence of edited
facts. While we find that current knowledge-editing approaches can recall
edited facts accurately, they fail catastrophically on the constructed
multi-hop questions. We thus propose a simple memory-based approach, MeLLo,
which stores all edited facts externally while prompting the language model
iteratively to generate answers that are consistent with the edited facts.
While MQuAKE remains challenging, we show that MeLLo scales well with LLMs (up
to 175B) and outperforms previous model editors by a large margin.Comment: Our code and datasets are available at
https://github.com/princeton-nlp/MQuAK
Higher-order Topological and Nodal Superconductors MS (M = Nb and Ta) Transition-metal Sulfides
Intrinsic topological superconducting materials are exotic and vital to
develop the next-generation topological superconducting devices, topological
quantum calculations, and quantum information technologies. Here, we predict
the topological and nodal superconductivity of MS (M = Nb and Ta)
transition-metal sulfides by using the density functional theory for
superconductors combining with the symmetry indicators. We reveal their
higher-order topology nature with an index of Z4 = 2. These materials have a
higher Tc than the Nb or Ta metal superconductors due to their flat-band and
strong electron-phonon coupling nature. Electron doping and lighter isotopes
can effectively enhance the Tc. Our findings show that the MS (M = Nb and Ta)
systems can be new platforms to study exotic physics in the higher-order
topological superconductors, and provide a theoretical support to utilize them
as the topological superconducting devices in the field of advanced topological
quantum calculations and information technologies.Comment: 5 pages, 3 figure
Dynabench: Rethinking Benchmarking in NLP
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field