3 research outputs found

    Integrating optical character recognition and machine translation of historical documents

    Get PDF
    Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how good the OCR is, this process introduces recognition errors, which often renders MT ineffective. In this paper, we propose a new OCR to MT framework based on adding a new OCR error correction module to enhance the overall quality of translation. Experimentation shows that our new system correction based on the combination of Language Modeling and Translation methods outperforms the baseline system by nearly 30% relative improvement

    Avaluaci贸 de la traducci贸 autom脿tica d'imatges mitjan莽ant sistemes de reconeixement de text en dispositius m貌bils : Google Translate Images i Microsoft Translator Images

    Get PDF
    Aquest treball presenta un exercici d'avaluaci贸 de resultats de traducci贸 autom脿tica mitjan莽ant les aplicacions m貌bils de traducci贸 d'imatges Google Translate Images i Microsoft Translator Images, en la combinaci贸 d'idiomes angl猫s espanyol. Es comprova la seva potencial funcionalitat per al p煤blic general i els professionals de la traducci贸. La investigaci贸 presenta un marc te貌ric a mode d'aproximaci贸 al concepte de qualitat en la ind煤stria de la traducci贸, m茅s concretament en la traducci贸 autom脿tica, aix铆 com a les diferents aplicacions de la intel路lig猫ncia artificial en la traducci贸 d'imatges; a la secci贸 pr脿ctica s'avaluen i comparen els resultats de TA de Google i Microsoft a partir d'una mostra d'imatges seleccionades aleat貌riament.Este trabajo presenta un ejercicio de evaluaci贸n de resultados de traducci贸n autom谩tica mediante las aplicaciones m贸viles de traducci贸n de im谩genes Google Translate Images y Microsoft Translator Images, en la combinaci贸n de idiomas ingl茅s espa帽ol. Se comprueba su potencial funcionalidad para el p煤blico general y los profesionales de la traducci贸n. La investigaci贸n introduce un marco te贸rico a modo de aproximaci贸n al concepto de calidad en la industria de la traducci贸n, m谩s concretamente en la traducci贸n autom谩tica, as铆 como a las distintas aplicaciones de la inteligencia artificial en la traducci贸n de im谩genes; en la secci贸n pr谩ctica se eval煤an y comparan los resultados de TA de Google y Microsoft a partir de un muestrario de im谩genes seleccionadas aleatoriamente.This paper presents an exercise in evaluating machine translation results using the mobile image translation applications Google Translate Images and Microsoft Translator Images in the English Spanish language combination. Its potential functionality for the general public and translation professionals is verified. The research presents a theoretical framework as an approach to the concept of quality in the translation industry, more specifically in machine translation, as well as the different applications of artificial intelligence in image translation; in the practical section, Google and Microsoft MT results are evaluated and compared from a sample of randomly selected images

    Integrating optical character recognition and machine translation of historical documents

    No full text
    Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how good the OCR is, this process introduces recognition errors, which often renders MT ineffective. In this paper, we propose a new OCR to MT framework based on adding a new OCR error correction module to enhance the overall quality of translation. Experimentation shows that our new system correction based on the combination of Language Modeling and Translation methods outperforms the baseline system by nearly 30% relative improvement
    corecore