24 research outputs found
Towards a general purpose machine translation system for Sranantongo
Machine translation for Sranantongo (Sranan, srn), a low-resource Creole
language spoken predominantly in Surinam, is virgin territory. In this study we
create a general purpose machine translation system for srn. In order to
facilitate this research, we introduce the SRNcorpus, a collection of parallel
Dutch (nl) to srn and monolingual srn data. We experiment with a wide range of
proven machine translation methods. Our results demonstrate a strong baseline
machine translation system for srn.Comment: Accepted to WiNLP (EMNLP). 2 page
Multilingual k-Nearest-Neighbor Machine Translation
k-nearest-neighbor machine translation has demonstrated remarkable
improvements in machine translation quality by creating a datastore of cached
examples. However, these improvements have been limited to high-resource
language pairs, with large datastores, and remain a challenge for low-resource
languages. In this paper, we address this issue by combining representations
from multiple languages into a single datastore. Our results consistently
demonstrate substantial improvements not only in low-resource translation
quality (up to +3.6 BLEU), but also for high-resource translation quality (up
to +0.5 BLEU). Our experiments show that it is possible to create multilingual
datastores that are a quarter of the size, achieving a 5.3x speed improvement,
by using linguistic similarities for datastore creation.Comment: Accepted to EMNL
Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens
We argue that translation quality alone is not a sufficient metric for
measuring knowledge transfer in multilingual neural machine translation. To
support this claim, we introduce Representational Transfer Potential (RTP),
which measures representational similarities between languages. We show that
RTP can measure both positive and negative transfer (interference), and find
that RTP is strongly correlated with changes in translation quality, indicating
that transfer does occur. Furthermore, we investigate data and language
characteristics that are relevant for transfer, and find that multi-parallel
overlap is an important yet under-explored feature. Based on this, we develop a
novel training scheme, which uses an auxiliary similarity loss that encourages
representations to be more invariant across languages by taking advantage of
multi-parallel data. We show that our method yields increased translation
quality for low- and mid-resource languages across multiple data and model
setups.Comment: Accepted to EMNLP 2023 Finding
How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data
Zero-shot translation aims to translate between language pairs not seen
during training in Multilingual Machine Translation (MMT) and is largely
considered an open problem. A common, albeit resource-consuming, solution is to
add as many related translation directions as possible to the training corpus.
In this paper, we show that for an English-centric model, surprisingly large
zero-shot improvements can be achieved by simply fine-tuning with a very small
amount of multi-parallel data. For example, on the EC30 dataset, we obtain up
to +21.7 ChrF non-English overall improvements (870 directions) by using only
100 multi-parallel samples while preserving English-centric translation
quality. When investigating the size effect of fine-tuning data and its
transfer capabilities, we found that already a small, randomly sampled set of
fine-tuning directions is sufficient to achieve comparable improvements. The
resulting non-English performance is close to the complete translation upper
bound. Even in a minimal setting -- fine-tuning with only one single sample --
the well-known off-target issue is almost completely resolved, explaining parts
-- but not all -- of the observed improvements in translation quality.Comment: 15 pages, 5 figure
Data Contamination Report from the 2024 CONDA Shared Task
The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant
aspects of data contamination in natural language processing, where data
contamination is understood as situations where evaluation data is included in
pre-training corpora used to train large scale models, compromising evaluation
results. The workshop fostered a shared task to collect evidence on data
contamination in current available datasets and models. The goal of the shared
task and associated database is to assist the community in understanding the
extent of the problem and to assist researchers in avoiding reporting
evaluation results on known contaminated resources. The shared task provides a
structured, centralized public database for the collection of contamination
evidence, open to contributions from the community via GitHub pool requests.
This first compilation paper is based on 566 reported entries over 91
contaminated sources from a total of 23 contributors. The details of the
individual contamination events are available in the platform. The platform
continues to be online, open to contributions from the community.Comment: https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Databas
Humanity's Last Exam
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai
Humanity's Last Exam
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai
Ecological and evolutionary response of Tethyan planktonic foraminifera to the middle Eocene climatic optimum (MECO) from the Alano section (NE Italy)
The enigmatic middle Eocene climatic optimum (MECO) is a transient (~500kyr) warming event that significantly interrupted at ~40 Ma the long-term cooling through the middle and late Eocene, eventually resulting in establishment of permanent Antarctic ice-sheet. This event is still poorly known and data on the biotic response are so far scarce. Here we present a detailed planktonic foraminiferal analysis of the MECO interval from a marginal basin of the central-western Tethys (Alano section, northeastern Italy). The expanded and continuous Alano section provides an excellent record of this event and offers an appealing opportunity to better understand the role of climate upon calcareous plankton evolution. A sapropel-like interval, characterized by excursions in both the carbon and oxygen bulk-carbonate isotope records, represents the lithological expression of the post-MECO event in the study area and follows the δ18O negative shift, interpreted as representing the MECO warming.High-resolution quantitative analysis performed on both >38 μm and >63 μm fractions reveals pronounced and complex changes in planktonic foraminiferal assemblages indicating a strong environmental perturbation that parallels the variations of the stable isotope curves corresponding to the MECO and post-MECO intervals. These changes consist primarily in a marked increase in abundance of the relatively eutrophic subbotinids and of the small, low-oxygen tolerant Streptochilus, Chiloguembelina and Pseudohas-tigerina. At the same time, the arrival of the abundant opportunist eutrophic Jenkinsina and Pseudoglobigerinella bolivariana, typical species of very high-productivity areas, also occurs. The pronounced shift from oligotrophic to more eutrophic, opportunist, low-oxygen tolerant planktonic foraminiferal assemblages suggests increased nutrient input and surface ocean productivity in response to the environmental perturbation associated with the MECO. Particularly critical environmental conditions have been reached during the deposition of the sapropel-like beds as testified by the presence of common giant and/or odd morphotypes. This is interpreted as evidence of transient alteration in the ocean chemistry.The enhanced surface water productivity inferred by planktonic foraminiferal assemblages at the onset of the event should have resulted in heavier δ13C values. The recorded lightening of the carbon stable isotope preceding the maximum warmth therefore represents a robust indication that it derives principally by a conspicuous increase of pCO2. The increased productivity of surface waters, also supported by geochemical data, may have acted as mechanism for pCO2 reduction and returned the climate system to the general Eocene cooling trend. The oxygen-depleted deep waters and the organic carbon burial following the peak of the MECO event represent the local response to the MECO warming and suggest that high sequestration of organic matter, if representing a widespread response to this event, might have contributed to the decrease of pCO2 as well. Though the true mechanisms are still obscure, several lines of evidence indicate a potential pressure on planktonic foraminiferal evolution during the MECO event including permanent changes besides transient and ecologically controlled variations
The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities
Fine-tuning large language models (LLMs) for machine translation has shown
improvements in overall translation quality. However, it is unclear what is the
impact of fine-tuning on desirable LLM behaviors that are not present in neural
machine translation models, such as steerability, inherent document-level
translation abilities, and the ability to produce less literal translations. We
perform an extensive translation evaluation on the LLaMA and Falcon family of
models with model size ranging from 7 billion up to 65 billion parameters. Our
results show that while fine-tuning improves the general translation quality of
LLMs, several abilities degrade. In particular, we observe a decline in the
ability to perform formality steering, to produce technical translations
through few-shot examples, and to perform document-level translation. On the
other hand, we observe that the model produces less literal translations after
fine-tuning on parallel data. We show that by including monolingual data as
part of the fine-tuning data we can maintain the abilities while simultaneously
enhancing overall translation quality. Our findings emphasize the need for
fine-tuning strategies that preserve the benefits of LLMs for machine
translation.Comment: Accepted to ACL 2024 (long, main). Latest version includes link to
the IdiomsInCtx-MT datase
