2 research outputs found
Joint translation and unit conversion for end-to-end localization
A variety of natural language tasks require processing of textual data which
contains a mix of natural language and formal languages such as mathematical
expressions. In this paper, we take unit conversions as an example and propose
a data augmentation technique which leads to models learning both translation
and conversion tasks as well as how to adequately switch between them for
end-to-end localization
Number Translation and Unit Conversion Using Machine Learning
Machine translation is widely utilized to translate text between different language pairs. Applications of automatic translation include content localization. Different regions of the world utilize different measurement units (e.g., acre vs. hectare). Correctly converting and translating measurement units is thus an important part of content localization. Current machine translation models have low accuracy when translating numbers and are unable to handle unit conversions. This disclosure describes techniques to train a machine learning model such that it can generate accurate translations of numbers, including unit conversions. A base model is trained using input text that is tokenized, including splitting numbers into individual digits. Parameters of the trained base model are used to initialize a custom model that is fine-tuned using training data that has been augmented to include annotations, e.g., different values and units for each measurement in the source text. The trained custom model described can deliver correct number translations and unit conversions and can be used for content localization