4 research outputs found
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Data contamination, i.e., the presence of test data from downstream tasks in
the training data of large language models (LLMs), is a potential major issue
in understanding LLMs' effectiveness on other tasks. We propose a
straightforward yet effective method for identifying data contamination within
LLMs. At its core, our approach starts by identifying potential contamination
in individual instances that are drawn from a small random sample; using this
information, our approach then assesses if an entire dataset partition is
contaminated. To estimate contamination of individual instances, we employ
"guided instruction:" a prompt consisting of the dataset name, partition type,
and the initial segment of a reference instance, asking the LLM to complete it.
An instance is flagged as contaminated if the LLM's output either exactly or
closely matches the latter segment of the reference. To understand if an entire
partition is contaminated, we propose two ideas. The first idea marks a dataset
partition as contaminated if the average overlap score with the reference
instances (as measured by ROUGE or BLEURT) is statistically significantly
better with the guided instruction vs. a general instruction that does not
include the dataset and partition name. The second idea marks a dataset as
contaminated if a classifier based on GPT-4 with in-context learning prompting
marks multiple instances as contaminated. Our best method achieves an accuracy
between 92% and 100% in detecting if an LLM is contaminated with seven
datasets, containing train and test/validation partitions, when contrasted with
manual evaluation by human expert. Further, our findings indicate that GPT-4 is
contaminated with AG News, WNLI, and XSum datasets.Comment: v1 preprin
Large Language Models As MOOCs Graders
Massive open online courses (MOOCs) unlock the doors to free education for
anyone around the globe with access to a computer and the internet. Despite
this democratization of learning, the massive enrollment in these courses means
it is almost impossible for one instructor to assess every student's writing
assignment. As a result, peer grading, often guided by a straightforward
rubric, is the method of choice. While convenient, peer grading often falls
short in terms of reliability and validity. In this study, using 18 distinct
settings, we explore the feasibility of leveraging large language models (LLMs)
to replace peer grading in MOOCs. Specifically, we focus on two
state-of-the-art LLMs: GPT-4 and GPT-3.5, across three distinct courses:
Introductory Astronomy, Astrobiology, and the History and Philosophy of
Astronomy. To instruct LLMs, we use three different prompts based on a variant
of the zero-shot chain-of-thought (Zero-shot-CoT) prompting technique:
Zero-shot-CoT combined with instructor-provided correct answers; Zero-shot-CoT
in conjunction with both instructor-formulated answers and rubrics; and
Zero-shot-CoT with instructor-offered correct answers and LLM-generated
rubrics. Our results show that Zero-shot-CoT, when integrated with
instructor-provided answers and rubrics, produces grades that are more aligned
with those assigned by instructors compared to peer grading. However, the
History and Philosophy of Astronomy course proves to be more challenging in
terms of grading as opposed to other courses. Finally, our study reveals a
promising direction for automating grading systems for MOOCs, especially in
subjects with well-defined rubrics.Comment: v1.3 preprin
Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
We propose a novel task-agnostic in-domain pre-training method that sits
between generic pre-training and fine-tuning. Our approach selectively masks
in-domain keywords, i.e., words that provide a compact representation of the
target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We
evaluate our approach using six different settings: three datasets combined
with two distinct pre-trained language models (PLMs). Our results reveal that
the fine-tuned PLMs adapted using our in-domain pre-training strategy
outperform PLMs that used in-domain pre-training with random masking as well as
those that followed the common pre-train-then-fine-tune paradigm. Further, the
overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the
pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).Comment: final version: accepted at ACL'23 RepL4NLP. arXiv admin note: text
overlap with arXiv:2208.1236
Recommended from our members
Prediction of blast loading on protruded structures using machine learning methods
Current empirical and semi-empirical based design manuals are restricted to the analysis of simple building configurations against blast loading. Prediction of blast loads for complex geometries is typically carried out with computational fluid dynamics solvers, which are known for their high computational cost. The combination of high-fidelity simulations with machine learning tools may significantly accelerate processing time, but the efficacy of such tools must be investigated. The present study evaluates various machine learning algorithms to predict peak overpressure and impulse on a protruded structure exposed to blast loading. A dataset with over 250,000 data points extracted from ProSAir simulations is used to train, validate, and test the models. Among the machine learning algorithms, gradient boosting models outperformed neural networks, demonstrating high predictive power. These models required significantly less time for hyperparameter optimization, and the randomized search approach achieved relatively similar results to that of grid search. Based on permutation feature importance studies, the protrusion length was considered a significantly more influential parameter in the construction of decision trees than building height.Immediate accessThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]