7 research outputs found
Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps
We investigate the role of various demonstration components in the in-context
learning (ICL) performance of large language models (LLMs). Specifically, we
explore the impacts of ground-truth labels, input distribution, and
complementary explanations, particularly when these are altered or perturbed.
We build on previous work, which offers mixed findings on how these elements
influence ICL. To probe these questions, we employ explainable NLP (XNLP)
methods and utilize saliency maps of contrastive demonstrations for both
qualitative and quantitative analysis. Our findings reveal that flipping
ground-truth labels significantly affects the saliency, though it's more
noticeable in larger LLMs. Our analysis of the input distribution at a granular
level reveals that changing sentiment-indicative terms in a sentiment analysis
task to neutral ones does not have as substantial an impact as altering
ground-truth labels. Finally, we find that the effectiveness of complementary
explanations in boosting ICL performance is task-dependent, with limited
benefits seen in sentiment analysis tasks compared to symbolic reasoning tasks.
These insights are critical for understanding the functionality of LLMs and
guiding the development of effective demonstrations, which is increasingly
relevant in light of the growing use of LLMs in applications such as ChatGPT.
Our research code is publicly available at https://github.com/paihengxu/XICL.Comment: 10 pages, 5 figure
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
Assessing instruction quality is a fundamental component of any improvement
efforts in the education system. However, traditional manual assessments are
expensive, subjective, and heavily dependent on observers' expertise and
idiosyncratic factors, preventing teachers from getting timely and frequent
feedback. Different from prior research that mostly focuses on low-inference
instructional practices on a singular basis, this paper presents the first
study that leverages Natural Language Processing (NLP) techniques to assess
multiple high-inference instructional practices in two distinct educational
settings: in-person K-12 classrooms and simulated performance tasks for
pre-service teachers. This is also the first study that applies NLP to measure
a teaching practice that is widely acknowledged to be particularly effective
for students with special needs. We confront two challenges inherent in
NLP-based instructional analysis, including noisy and long input data and
highly skewed distributions of human ratings. Our results suggest that
pretrained Language Models (PLMs) demonstrate performances comparable to the
agreement level of human raters for variables that are more discrete and
require lower inference, but their efficacy diminishes with more complex
teaching practices. Interestingly, using only teachers' utterances as input
yields strong results for student-centered variables, alleviating common
concerns over the difficulty of collecting and transcribing high-quality
student speech data in in-person teaching settings. Our findings highlight both
the potential and the limitations of current NLP techniques in the education
domain, opening avenues for further exploration.Comment: NAACL 202
Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications
Emojis, which encapsulate semantics beyond mere words or phrases, have become
prevalent in social network communications. This has spurred increasing
scholarly interest in exploring their attributes and functionalities. However,
emoji-related research and application face two primary challenges. First,
researchers typically rely on crowd-sourcing to annotate emojis in order to
understand their sentiments, usage intentions, and semantic meanings. Second,
subjective interpretations by users can often lead to misunderstandings of
emojis and cause the communication barrier. Large Language Models (LLMs) have
achieved significant success in various annotation tasks, with ChatGPT
demonstrating expertise across multiple domains. In our study, we assess
ChatGPT's effectiveness in handling previously annotated and downstream tasks.
Our objective is to validate the hypothesis that ChatGPT can serve as a viable
alternative to human annotators in emoji research and that its ability to
explain emoji meanings can enhance clarity and transparency in online
communications. Our findings indicate that ChatGPT has extensive knowledge of
emojis. It is adept at elucidating the meaning of emojis across various
application scenarios and demonstrates the potential to replace human
annotators in a range of tasks.Comment: 12 pages, 2 page appendi
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
Causal inference has shown potential in enhancing the predictive accuracy,
fairness, robustness, and explainability of Natural Language Processing (NLP)
models by capturing causal relationships among variables. The emergence of
generative Large Language Models (LLMs) has significantly impacted various NLP
domains, particularly through their advanced reasoning capabilities. This
survey focuses on evaluating and improving LLMs from a causal view in the
following areas: understanding and improving the LLMs' reasoning capacity,
addressing fairness and safety issues in LLMs, complementing LLMs with
explanations, and handling multimodality. Meanwhile, LLMs' strong reasoning
capacities can in turn contribute to the field of causal inference by aiding
causal relationship discovery and causal effect estimations. This review
explores the interplay between causal inference frameworks and LLMs from both
perspectives, emphasizing their collective potential to further the development
of more advanced and equitable artificial intelligence systems