8 research outputs found
Recommended from our members
A Note on the Unconditional Bias of the Nadaraya-Watson Regression Estimator
In this note we investigate the order of the unconditional bias of the Nadarya-Watson (Nadaraya, 1964; Watson, 1964) estimator for a multivariate regression. Surprisingly, previous attempts in establishing this result are either imprecise and technically deficient, or of limited use given the assumptions imposed (see inter alia, Glad (1998), Mack and Müller (1988), Pagan and Ullah (1999), and Scott (2015)). The results are also often conflicting (see inter alia, Choi et al. (2000), Chu and Marron (1991), Collomb (1981), and Glad (1998)). Unfortunately, our result here is incomplete, but we highlight the issues and suggest further ideas to resolve them
Active Keyword Selection to Track Evolving Topics on Twitter
How can we study social interactions on evolving topics at a mass scale? Over
the past decade, researchers from diverse fields such as economics, political
science, and public health have often done this by querying Twitter's public
API endpoints with hand-picked topical keywords to search or stream
discussions. However, despite the API's accessibility, it remains difficult to
select and update keywords to collect high-quality data relevant to topics of
interest. In this paper, we propose an active learning method for rapidly
refining query keywords to increase both the yielded topic relevance and
dataset size. We leverage a large open-source COVID-19 Twitter dataset to
illustrate the applicability of our method in tracking Tweets around the key
sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method
achieves an average topic-related keyword recall 2x higher than baselines. We
open-source our code along with a web interface for keyword selection to make
data collection from Twitter more systematic for researchers.Comment: 10 pages, 3 figure
Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4
Misinformation poses a critical societal challenge, and current approaches
have yet to produce an effective solution. We propose focusing on
generalization, soft classification, and leveraging recent large language
models to create more practical tools in contexts where perfect predictions
remain unattainable. We begin by demonstrating that GPT-4 and other language
models can outperform existing methods in the literature. Next, we explore
their generalization, revealing that GPT-4 and RoBERTa-large exhibit critical
differences in failure modes, which offer potential for significant performance
improvements. Finally, we show that these models can be employed in soft
classification frameworks to better quantify uncertainty. We find that models
with inferior hard classification results can achieve superior soft
classification performance. Overall, this research lays groundwork for future
tools that can drive real-world progress on misinformation
Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation
Large Language Models have emerged as prime candidates to tackle
misinformation mitigation. However, existing approaches struggle with
hallucinations and overconfident predictions. We propose an uncertainty
quantification framework that leverages both direct confidence elicitation and
sampled-based consistency methods to provide better calibration for NLP
misinformation mitigation solutions. We first investigate the calibration of
sample-based consistency methods that exploit distinct features of consistency
across sample sizes and stochastic levels. Next, we evaluate the performance
and distributional shift of a robust numeric verbalization prompt across single
vs. two-step confidence elicitation procedure. We also compare the performance
of the same prompt with different versions of GPT and different numerical
scales. Finally, we combine the sample-based consistency and verbalized methods
to propose a hybrid framework that yields a better uncertainty estimation for
GPT models. Overall, our work proposes novel uncertainty quantification methods
that will improve the reliability of Large Language Models in misinformation
mitigation applications.Comment: 12 pages, 11 figure
Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation
Recent large language models (LLMs) have been shown to be effective for
misinformation detection. However, the choice of LLMs for experiments varies
widely, leading to uncertain conclusions. In particular, GPT-4 is known to be
strong in this domain, but it is closed source, potentially expensive, and can
show instability between different versions. Meanwhile, alternative LLMs have
given mixed results. In this work, we show that Zephyr-7b presents a
consistently viable alternative, overcoming key limitations of commonly used
approaches like Llama-2 and GPT-3.5. This provides the research community with
a solid open-source option and shows open-source models are gradually catching
up on this task. We then highlight how GPT-3.5 exhibits unstable performance,
such that this very widely used model could provide misleading results in
misinformation detection. Finally, we validate new tools including approaches
to structured output and the latest version of GPT-4 (Turbo), showing they do
not compromise performance, thus unlocking them for future research and
potentially enabling more complex pipelines for misinformation mitigation
Open, Closed, or Small Language Models for Text Classification?
Recent advancements in large language models have demonstrated remarkable
capabilities across various NLP tasks. But many questions remain, including
whether open-source models match closed ones, why these models excel or
struggle with certain tasks, and what types of practical procedures can improve
performance. We address these questions in the context of classification by
evaluating three classes of models using eight datasets across three distinct
tasks: named entity recognition, political party prediction, and misinformation
detection. While larger LLMs often lead to improved performance, open-source
models can rival their closed-source counterparts by fine-tuning. Moreover,
supervised smaller models, like RoBERTa, can achieve similar or even greater
performance in many datasets compared to generative LLMs. On the other hand,
closed models maintain an advantage in hard tasks that demand the most
generalizability. This study underscores the importance of model selection
based on task requirementsComment: 14 pages, 15 Tables, 1 Figur
Quantifying learning-style adaptation in effectiveness of LLM teaching
This preliminary study aims to investigate whether AI, when prompted based on individual learning styles, can effectively improve comprehension and learning experiences in educational settings. It involves tailoring LLMs baseline prompts and comparing the results of a control group receiving standard content and an experimental group receiving learning styletailored content. Preliminary results suggest that GPT-4 can generate responses aligned with various learning styles, indicating the potential for enhanced engagement and comprehension. However, these results also reveal challenges, including the model's tendency for sycophantic behavior and variability in responses. Our findings suggest that a more sophisticated approach is required for integrating AI into education (AIEd) to improve educational outcomes
Adversarial Policies Beat Superhuman Go AIs
We attack the state-of-the-art Go-playing AI system KataGo by training
adversarial policies against it, achieving a >97% win rate against KataGo
running at superhuman settings. Our adversaries do not win by playing Go well.
Instead, they trick KataGo into making serious blunders. Our attack transfers
zero-shot to other superhuman Go-playing AIs, and is comprehensible to the
extent that human experts can implement it without algorithmic assistance to
consistently beat superhuman AIs. The core vulnerability uncovered by our
attack persists even in KataGo agents adversarially trained to defend against
our attack. Our results demonstrate that even superhuman AI systems may harbor
surprising failure modes. Example games are available https://goattack.far.ai/.Comment: Accepted to ICML 2023, see paper for changelo