Machine translation (MT) of Arabic dialects presents unique challenges, mainly due to their rich cultural context, and the scarcity of linguistic resources. While Large Language Models (LLMs) such as ChatGPT, LLaMA, and BLOOM have demonstrated remarkable capabilities across a range of MT tasks, their effectiveness in translating culturally embedded dialects remains largely unexplored. This thesis specifically investigates the effectiveness of LLMs in translating the Lebanese dialect, a prominent Arabic variant in the Levant region, known for its rich cultural heritage and complex idiomatic language.
A key limitation in dialectal MT is the scarcity of culturally representative datasets needed to develop effective models. The few existing Lebanese-English parallel datasets suffer from cultural misalignment due to their translation from non-native resources. To address this gap, this research introduces two culturally aware resources- LW, and LebEval- derived from authentic Lebanese podcasts, and professionally translated to English. It further investigates the advantage of collecting such authentic datasets by conducting comprehensive experiments comparing the performance of Arabic-centric LLMs against NMT systems. Findings reveal that while both architectures perform similarly on non-native datasets, LLMs demonstrate superior capabilities in preserving cultural nuances, outperforming NMTs by a significant margin on the LW data. Additionally, while fine-tuning LLMs with instructional data has shown promising results in MT tasks, there has been little to no effort dedicated to adapting these techniques specifically for Arabic or its diverse dialects. This work explores fine-tuning the open-source Aya23 model on three types of instructions: 1) parallel Lebanese/English instructions, 2) contrastive instructions, and 3) Grammar-hint instructions. Results demonstrate that models fine-tuned on a smaller but culturally aware Lebanese dataset (LW) consistently outperform those trained on larger, non-native data. They also show the superiority of fine-tuning using contrastive instructions, highlighting the value of leveraging translation errors. Finally, while most studies on the translation of Arabic dialects rely on the statistical evaluation metric BLEU, despite its well-documented limitations, this research takes a different approach by conducting a human correlation analysis with different evaluation metrics. Findings validate the shortcomings of BLEU and showcase xCOMET as a more reliable and culturally sensitive metric for evaluating the quality of MT in this domain.
Overall, this thesis makes significant contributions to culturally aware dialectal MT, highlighting the potential of leveraging LLMs and challenging the prevailing "more data is better" paradigm
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.