7 research outputs found
Recommended from our members
Understanding and Improving Language Models Through a Data-Centric Lens
Training data has played a major role in the rise of large deep learning models. In particular, the scale and diversity of training data has led to incredible new capabilities in large language models. However, despite the success of such models, a notable gap persists in understanding the important role that data plays in their performance, and how to use this understanding to further improve models. In this work, we advocate for, and demonstrate the effectiveness of, data-centric AI.In the first part of this dissertation, we aim to better understand language models through their data. First, we design a relation extraction system that outputs human-interpretable intermediate outputs, allowing us to better understand why the system makes its predictions. Next, we delve into the intricate relationship between data and models by studying zero-shot and few-shot transfer learning settings, giving us insights into the interactions that training data has on model performance across diverse tasks.Based on the lessons from the first part of this dissertation, we next aim to improve the data used to train models. We first demonstrate that data selection can be formulated as a multi-armed bandit problem, where the goal is to optimize a model's training data. We apply the multi-armed bandit formulation first to the few-shot fine-tuning setting, and then to language model pretraining, designing algorithms and rewards that are unique for each problem setting. Finally, we show that for cross-lingual question answering, data augmentation is a strong approach to improving the diversity of training data, leading to improved performance.Overall, this work aims to improve our understanding of how deep learning models work, using data as the viewpoint. Further, we take this understanding and use it to develop data-efficient and performant models. We conclude the dissertation with discussions of future research in data-centric AI and propose avenues for extending these concepts into new research directions
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
Large Language Models (LLMs) have shown human-like reasoning abilities but
still struggle with complex logical problems. This paper introduces a novel
framework, Logic-LM, which integrates LLMs with symbolic solvers to improve
logical problem-solving. Our method first utilizes LLMs to translate a natural
language problem into a symbolic formulation. Afterward, a deterministic
symbolic solver performs inference on the formulated problem. We also introduce
a self-refinement module, which utilizes the symbolic solver's error messages
to revise symbolic formalizations. We demonstrate Logic-LM's effectiveness on
five logical reasoning datasets: ProofWriter, PrOntoQA, FOLIO,
LogicalDeduction, and AR-LSAT. On average, Logic-LM achieves a significant
performance boost of 39.2% over using LLM alone with standard prompting and
18.4% over LLM with chain-of-thought prompting. Our findings suggest that
Logic-LM, by combining LLMs with symbolic logic, offers a promising avenue for
faithful logical reasoning. Code and data are publicly available at
https://github.com/teacherpeterpan/Logic-LLM.Comment: EMNLP 2023 (Findings, long paper
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models
Multi-task learning (MTL), instruction tuning, and prompting have recently
been shown to improve the generalizability of large language models to new
tasks. However, the benefits of such methods are less well-documented in
smaller language models, with some studies finding contradictory results. In
this work, we explore and isolate the effects of (i) model size, (ii) general
purpose MTL, (iii) in-domain MTL, (iv) instruction tuning, and (v) few-shot
fine-tuning for models with fewer than 500 million parameters. Our experiments
in the zero-shot setting demonstrate that models gain 31% relative improvement,
on average, from general purpose MTL, with an additional 37.6% relative gain
from in-domain MTL. Contradictory to prior works on large models, we find that
instruction tuning provides a modest 2% performance improvement for small
models
Emotion Recognition in Conversation using Probabilistic Soft Logic
Creating agents that can both appropriately respond to conversations and
understand complex human linguistic tendencies and social cues has been a long
standing challenge in the NLP community. A recent pillar of research revolves
around emotion recognition in conversation (ERC); a sub-field of emotion
recognition that focuses on conversations or dialogues that contain two or more
utterances. In this work, we explore an approach to ERC that exploits the use
of neural embeddings along with complex structures in dialogues. We implement
our approach in a framework called Probabilistic Soft Logic (PSL), a
declarative templating language that uses first-order like logical rules, that
when combined with data, define a particular class of graphical model.
Additionally, PSL provides functionality for the incorporation of results from
neural models into PSL models. This allows our model to take advantage of
advanced neural methods, such as sentence embeddings, and logical reasoning
over the structure of a dialogue. We compare our method with state-of-the-art
purely neural ERC systems, and see almost a 20% improvement. With these
results, we provide an extensive qualitative and quantitative analysis over the
DailyDialog conversation dataset
CausalDialogue: Modeling Utterance-level Causality in Conversations
Despite their widespread adoption, neural conversation models have yet to
exhibit natural chat capabilities with humans. In this research, we examine
user utterances as causes and generated responses as effects, recognizing that
changes in a cause should produce a different effect. To further explore this
concept, we have compiled and expanded upon a new dataset called CausalDialogue
through crowd-sourcing. This dataset includes multiple cause-effect pairs
within a directed acyclic graph (DAG) structure. Our analysis reveals that
traditional loss functions struggle to effectively incorporate the DAG
structure, leading us to propose a causality-enhanced method called Exponential
Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at
the utterance level in training neural conversation models. To evaluate the
needs of considering causality in dialogue generation, we built a comprehensive
benchmark on CausalDialogue dataset using different models, inference, and
training methods. Through experiments, we find that a causality-inspired loss
like ExMATE can improve the diversity and agility of conventional loss function
and there is still room for improvement to reach human-level quality on this
new dataset.Comment: Accepted to ACL-Findings 202
Modeling Disclosive Transparency in NLP Application Descriptions
Broader disclosive transparencytruth and clarity in communication
regarding the function of AI systemsis widely considered desirable.
Unfortunately, it is a nebulous concept, difficult to both define and quantify.
This is problematic, as previous work has demonstrated possible trade-offs and
negative consequences to disclosive transparency, such as a confusion effect,
where "too much information" clouds a reader's understanding of what a system
description means. Disclosive transparency's subjective nature has rendered
deep study into these problems and their remedies difficult. To improve this
state of affairs, We introduce neural language model-based probabilistic
metrics to directly model disclosive transparency, and demonstrate that they
correlate with user and expert opinions of system transparency, making them a
valid objective proxy. Finally, we demonstrate the use of these metrics in a
pilot study quantifying the relationships between transparency, confusion, and
user perceptions in a corpus of real NLP system descriptions.Comment: To appear at EMNLP 2021. 15 pages, 10 figures, 7 table
NeuPSL: Neural Probabilistic Soft Logic
We present Neural Probabilistic Soft Logic (NeuPSL), a novel neuro-symbolic
(NeSy) framework that unites state-of-the-art symbolic reasoning with the
low-level perception of deep neural networks. To explicitly model the boundary
between neural and symbolic representations, we introduce NeSy Energy-Based
Models, a general family of energy-based models that combine neural and
symbolic reasoning. Using this framework, we show how to seamlessly integrate
neural and symbolic parameter learning and inference. We perform an extensive
empirical evaluation and show that NeuPSL outperforms existing methods on joint
inference and has significantly lower variance in almost all settings