2 research outputs found

    Joint translation and unit conversion for end-to-end localization

    Full text link
    A variety of natural language tasks require processing of textual data which contains a mix of natural language and formal languages such as mathematical expressions. In this paper, we take unit conversions as an example and propose a data augmentation technique which leads to models learning both translation and conversion tasks as well as how to adequately switch between them for end-to-end localization

    Number Translation and Unit Conversion Using Machine Learning

    Get PDF
    Machine translation is widely utilized to translate text between different language pairs. Applications of automatic translation include content localization. Different regions of the world utilize different measurement units (e.g., acre vs. hectare). Correctly converting and translating measurement units is thus an important part of content localization. Current machine translation models have low accuracy when translating numbers and are unable to handle unit conversions. This disclosure describes techniques to train a machine learning model such that it can generate accurate translations of numbers, including unit conversions. A base model is trained using input text that is tokenized, including splitting numbers into individual digits. Parameters of the trained base model are used to initialize a custom model that is fine-tuned using training data that has been augmented to include annotations, e.g., different values and units for each measurement in the source text. The trained custom model described can deliver correct number translations and unit conversions and can be used for content localization
    corecore