Search CORE

7 research outputs found

Recommended from our members

Understanding and Improving Language Models Through a Data-Centric Lens

Author: Albalak Alon
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Training data has played a major role in the rise of large deep learning models. In particular, the scale and diversity of training data has led to incredible new capabilities in large language models. However, despite the success of such models, a notable gap persists in understanding the important role that data plays in their performance, and how to use this understanding to further improve models. In this work, we advocate for, and demonstrate the effectiveness of, data-centric AI.In the first part of this dissertation, we aim to better understand language models through their data. First, we design a relation extraction system that outputs human-interpretable intermediate outputs, allowing us to better understand why the system makes its predictions. Next, we delve into the intricate relationship between data and models by studying zero-shot and few-shot transfer learning settings, giving us insights into the interactions that training data has on model performance across diverse tasks.Based on the lessons from the first part of this dissertation, we next aim to improve the data used to train models. We first demonstrate that data selection can be formulated as a multi-armed bandit problem, where the goal is to optimize a model's training data. We apply the multi-armed bandit formulation first to the few-shot fine-tuning setting, and then to language model pretraining, designing algorithms and rewards that are unique for each problem setting. Finally, we show that for cross-lingual question answering, data augmentation is a strong approach to improving the diversity of training data, leading to improved performance.Overall, this work aims to improve our understanding of how deep learning models work, using data as the viewpoint. Further, we take this understanding and use it to develop data-efficient and performant models. We conclude the dissertation with discussions of future research in data-centric AI and propose avenues for extending these concepts into new research directions

eScholarship - University of California

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Author: Albalak Alon
Pan Liangming
Wang William Yang
Wang Xinyi
Publication venue
Publication date: 18/10/2023
Field of study

Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems. This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving. Our method first utilizes LLMs to translate a natural language problem into a symbolic formulation. Afterward, a deterministic symbolic solver performs inference on the formulated problem. We also introduce a self-refinement module, which utilizes the symbolic solver's error messages to revise symbolic formalizations. We demonstrate Logic-LM's effectiveness on five logical reasoning datasets: ProofWriter, PrOntoQA, FOLIO, LogicalDeduction, and AR-LSAT. On average, Logic-LM achieves a significant performance boost of 39.2% over using LLM alone with standard prompting and 18.4% over LLM with chain-of-thought prompting. Our findings suggest that Logic-LM, by combining LLMs with symbolic logic, offers a promising avenue for faithful logical reasoning. Code and data are publicly available at https://github.com/teacherpeterpan/Logic-LLM.Comment: EMNLP 2023 (Findings, long paper

arXiv.org e-Print Archive

Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models

Author: Albalak Alon
Ross Mike
Sagar Adithya
Sankar Chinnadhurai
Shrivastava Akshat
Publication venue
Publication date: 07/10/2022
Field of study

Multi-task learning (MTL), instruction tuning, and prompting have recently been shown to improve the generalizability of large language models to new tasks. However, the benefits of such methods are less well-documented in smaller language models, with some studies finding contradictory results. In this work, we explore and isolate the effects of (i) model size, (ii) general purpose MTL, (iii) in-domain MTL, (iv) instruction tuning, and (v) few-shot fine-tuning for models with fewer than 500 million parameters. Our experiments in the zero-shot setting demonstrate that models gain 31% relative improvement, on average, from general purpose MTL, with an additional 37.6% relative gain from in-domain MTL. Contradictory to prior works on large models, we find that instruction tuning provides a modest 2% performance improvement for small models

arXiv.org e-Print Archive

Emotion Recognition in Conversation using Probabilistic Soft Logic

Author: Albalak Alon
Augustine Eriq
Dickens Charles
Getoor Lise
Jandaghi Pegah
Pryor Connor
Wang William
Publication venue
Publication date: 14/07/2022
Field of study

Creating agents that can both appropriately respond to conversations and understand complex human linguistic tendencies and social cues has been a long standing challenge in the NLP community. A recent pillar of research revolves around emotion recognition in conversation (ERC); a sub-field of emotion recognition that focuses on conversations or dialogues that contain two or more utterances. In this work, we explore an approach to ERC that exploits the use of neural embeddings along with complex structures in dialogues. We implement our approach in a framework called Probabilistic Soft Logic (PSL), a declarative templating language that uses first-order like logical rules, that when combined with data, define a particular class of graphical model. Additionally, PSL provides functionality for the incorporation of results from neural models into PSL models. This allows our model to take advantage of advanced neural methods, such as sentence embeddings, and logical reasoning over the structure of a dialogue. We compare our method with state-of-the-art purely neural ERC systems, and see almost a 20% improvement. With these results, we provide an extensive qualitative and quantitative analysis over the DailyDialog conversation dataset

arXiv.org e-Print Archive

CausalDialogue: Modeling Utterance-level Causality in Conversations

Author: Albalak Alon
Getoor Lise
Pryor Connor
Saxon Michael
Tuan Yi-Lin
Wang William Yang
Xu Wenda
Publication venue
Publication date: 08/07/2023
Field of study

Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the needs of considering causality in dialogue generation, we built a comprehensive benchmark on CausalDialogue dataset using different models, inference, and training methods. Through experiments, we find that a causality-inspired loss like ExMATE can improve the diversity and agility of conventional loss function and there is still room for improvement to reach human-level quality on this new dataset.Comment: Accepted to ACL-Findings 202

arXiv.org e-Print Archive

Modeling Disclosive Transparency in NLP Application Descriptions

Author: Albalak Alon
Levy Sharon
Saxon Michael
Wang William Yang
Wang Xinyi
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 10/09/2021
Field of study

Broader disclosive transparency

-

truth and clarity in communication regarding the function of AI systems

-

is widely considered desirable. Unfortunately, it is a nebulous concept, difficult to both define and quantify. This is problematic, as previous work has demonstrated possible trade-offs and negative consequences to disclosive transparency, such as a confusion effect, where "too much information" clouds a reader's understanding of what a system description means. Disclosive transparency's subjective nature has rendered deep study into these problems and their remedies difficult. To improve this state of affairs, We introduce neural language model-based probabilistic metrics to directly model disclosive transparency, and demonstrate that they correlate with user and expert opinions of system transparency, making them a valid objective proxy. Finally, we demonstrate the use of these metrics in a pilot study quantifying the relationships between transparency, confusion, and user perceptions in a corpus of real NLP system descriptions.Comment: To appear at EMNLP 2021. 15 pages, 10 figures, 7 table

arXiv.org e-Print Archive

NeuPSL: Neural Probabilistic Soft Logic

Author: Albalak Alon
Augustine Eriq
Dickens Charles
Getoor Lise
Pryor Connor
Wang William
Publication venue
Publication date: 14/06/2022
Field of study

We present Neural Probabilistic Soft Logic (NeuPSL), a novel neuro-symbolic (NeSy) framework that unites state-of-the-art symbolic reasoning with the low-level perception of deep neural networks. To explicitly model the boundary between neural and symbolic representations, we introduce NeSy Energy-Based Models, a general family of energy-based models that combine neural and symbolic reasoning. Using this framework, we show how to seamlessly integrate neural and symbolic parameter learning and inference. We perform an extensive empirical evaluation and show that NeuPSL outperforms existing methods on joint inference and has significantly lower variance in almost all settings

arXiv.org e-Print Archive