5 research outputs found
Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration
Pretrained multilingual encoder models can directly perform zero-shot
multilingual tasks or linguistic probing by reformulating the input examples
into cloze-style prompts. This is accomplished by predicting the probabilities
of the label words at the masked token position, without requiring any updates
to the model parameters. However, the performance of this method is limited by
the model's bias toward predicting label words which frequently occurred during
the pretraining. These words typically receive high probabilities. To address
this issue, we combine the models with calibration techniques which modify the
probabilities of label words predicted by the models. We first validate the
effectiveness of a proposed simple calibration method together with other
existing techniques on monolingual encoders in both zero- and few-shot
scenarios. We subsequently employ these calibration techniques on multilingual
encoders, resulting in substantial performance improvements across a wide range
of tasks.Comment: Accepted to Findings of EMNLP 202
Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages
Multilingual Pretrained Language Models (MPLMs) perform strongly in cross-lingual transfer. We propose Prompts Augmented by Retrieval Crosslingually (PARC) to improve zero-shot performance on low-resource languages (LRLs) by augmenting the context with prompts consisting of semantically similar sentences retrieved from a high-resource language (HRL). PARC improves zero-shot performance on three downstream tasks (sentiment classification, topic categorization, natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in unlabeled (+5.1%) and labeled settings (+16.3%). PARC also outperforms finetuning by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs
Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages
Multilingual Pretrained Language Models (MPLMs) have shown their strong
multilinguality in recent empirical cross-lingual transfer studies. In this
paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC)
pipeline to improve the zero-shot performance on low-resource languages (LRLs)
by augmenting the context with semantically similar sentences retrieved from a
high-resource language (HRL) as prompts. PARC improves the zero-shot
performance on three downstream tasks (binary sentiment classification, topic
categorization and natural language inference) with multilingual parallel test
sets across 10 LRLs covering 6 language families in both unlabeled settings
(+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the
finetuning baseline by 3.7%. We find a significant positive correlation between
cross-lingual transfer performance on one side, and the similarity between the
high- and low-resource languages as well as the amount of low-resource
pretraining data on the other side. A robustness analysis suggests that PARC
has the potential to achieve even stronger performance with more powerful
MPLMs.Comment: Accepted to Findings of ACL 202
Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models
Large Language Models (LLMs) demonstrate remarkable performance on a variety
of natural language understanding (NLU) tasks, primarily due to their
in-context learning ability. This ability could be applied to building babylike
models, i.e. models at small scales, improving training efficiency. In this
paper, we propose a "CoThought" pipeline, which efficiently trains smaller
"baby" language models (BabyLMs) by leveraging the Chain of Thought prompting
of LLMs. Our pipeline restructures a dataset of less than 100M in size using
GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that
are comparable to the school texts for language learners. The BabyLM is then
pretrained on this restructured dataset in a RoBERTa fashion. In evaluations
across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10
linguistic, NLU, and question-answering tasks by more than 3 points, showing a
superior ability to extract contextual information. These results suggest that
compact LMs pretrained on small, LLM-restructured data can better understand
tasks and achieve improved performance.Comment: CoNLL 2023 BabyLM Challeng