Analysis and Applications of Cross-Lingual Models in Natural Language Processing

Fujinuma, Yoshinari

Analysis and Applications of Cross-Lingual Models in Natural Language Processing

Authors: Yoshinari Fujinuma
Publication date: 17 November 2021
Publisher: University of Colorado Boulder

Abstract

Human languages vary in terms of both typologically and data availability. A typical machine learning-based approach for natural language processing (NLP) requires training data from the language of interest. However, because machine learning-based approaches heavily rely on the amount of data available in each language, the quality of trained model languages without a large amount of data is poor. One way to overcome the lack of data in each language is to conduct cross-lingual transfer learning from resource-rich languages to resource-scarce languages. Cross-lingual word embeddings and multilingual contextualized embeddings are commonly used to conduct cross-lingual transfer learning. However, the lack of resources still makes it challenging to either evaluate or improve such models. This dissertation first proposes a graph-based method to overcome the lack of evaluation data in low-resource languages by focusing on the structure of cross-lingual word embeddings, further discussing approaches to improve cross-lingual transfer learning by using retrofitting methods and by focusing on a specific task. Finally, it provides an analysis of the effect of adding different languages when pretraining multilingual models

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

CU Scholar Institutional Repository

oai:cuscholar:sn009z968

Last time updated on 03/01/2024